The current state of XDR: A Rick-the-toolman essay.

Jun 17, 2024

CSO Perspectives is a weekly column and podcast where Rick Howard discusses the ideas, strategies and technologies that senior cybersecurity executives wrestle with on a daily basis.

The current state of XDR: A Rick-the-toolman essay.

Listen to the audio version of this story.

In 2021, I wrote a Rick-the-Toolman love letter to this new fangled security tool called XDR. You might have heard about it. The acronym stands for “eXtended Detection and Response” and I was gushing about how this tool might transform the modern day security architecture. Back then, Gartner placed XDR at the beginning of the journey on its famous Hype chart; just starting to climb the Peak of Inflated Expectations, and I was jumping on the bandwagon to help inflate the hype.

As of July 2023 (two years later), Gartner placed XDR on the back end of the peak, just starting the steep roller coaster ride down toward the Trough of Disillusionment, and forecasted 5-10 years before it reaches the Plateau of Productivity. Since this is the time typically when security pros start to lose faith in a product idea because the hype surrounding it hasn’t matched existing products, I thought it was time to revisit the current state of XDR because I still believe that it represents the future security architecture that we all need. I don’t want the infosec profession to lose sight of this potentially transformational tool just because it’s not quite ready for prime time.

Confusion about what XDR is.

I can understand why the idea of XDR is sprinting towards the Trough of Disillusionment though. Most of the security platform vendors have a product that they call XDR (Splunk, Microsoft, IBM, Crowdstrike, Cisco, Palo Alto Networks, etc), but none of their explanations about what XDR is and what it does matches exactly. Gartner says that XDR is a “unified security incident detection and response platform that automatically collects and correlates data from multiple proprietary security components.” That’s accurate but you could also say the same thing about SIEM tools (Security Information and Event Management tools). I'm looking for something a little more descriptive. What makes XDR special?

The subtle difference between a SIEM tool and an XDR tool is how the two technologies collect the data. With SIEM tools, the monitored system (let’s say a Fortinet firewall) generates logs as part of its normal operation. The firewall administrator configures the system to automatically send the log data to the SIEM tool for storage and processing. The XDR tool is different. XDR administrators configure the tool to directly connect to the Checkpoint firewall via an API (Application Programming Interface). The API allows XDR administrators to interrogate the firewall for the specific data they need (not just general purpose log data but any information on the system) and transports the data to the vendor-provided XDR data lake for storage and future processing.

Both methods allow, as Gartner says, a platform to collect data from varied sources: log data in the case of the SIEM tool and any kind of data in the case of the XDR tool. But, the evolutionary step of using APIs to collect the data is what makes XDR tools so transformational. It gives us some options.

In order to understand what I mean by this, it might help to understand that XDR arrived on the scene in 2018 by merging two different security toolsets: logging and anti-virus. Let’s start with logging.

Logging: The prequel to XDR.

Raffael Marty, over at the Venture Beach website, says that you can trace the origin of the logging piece all the way back to the original email Sendmail program on BSD Unix in the 1980s. Eric Allman was building Sendmail to be one of the first to implement the Simple Mail Transfer Protocol. He needed a way to log what was happening as the various pieces and parts of the Sendmail system banged against each other. When he wrote the first syslogd program for BSD Unix to do that, he birthed the first logging system that we all know and use today.

For the uninitiated, syslog stands for systems logging and the d stands for daemon. In the Unix world, daemons are little standalone programs that startup, do a task, and then disappear again until needed. In this case, syslogd receives a log message from a monitored system (the Fortinet firewall) and stores it somewhere.

As the years went by though, we started collecting logs on everything. The amount of stored data started to become unmanageable. In the late 1990s and early 2000s, SIEM tools emerged to help us corral the volume of messages. Instead of collecting logs separately for each application and trying to manually correlate the information with homemade databases, administrators could dump all the logs to this centralized system and use some of the vendor-provided functionality to scrub the data.

But these SIEM systems were expensive. You had to provide local storage, hard disk space, to accommodate the volume of data. I remember it was a constant struggle to keep ahead of the demand. Everytime we added more disk space, we filled them up with data quickly. The vendors, of course, made their money by selling more disk space so they were only too accommodating to help us upgrade. But like I said, upgrades were expensive. Infosec professionals were making tradeoff decisions about what not to save to disk or how long we would store things before we would overwrite them. That was counter to what we were trying to do with the logging project in the first place. You wanted to use the logs to trace bad guy activity over time. If your logs only went back three weeks or if your analysts needed log data on systems you weren’t watching, that was a problem.

It was also a major task to manage the storage system. Unless you were a Fortune 500 company or your vertical had strict compliance and reporting requirements, most of us couldn’t afford to buy and maintain them. That started to change when Amazon rolled out AWS in 2006. AWS made it possible to store all kinds of data relatively cheaply and they handled all of the administration (Bonus!).

There was another big problem though. All vendors used their own proprietary logging format. If security professionals tried to correlate their Cisco Firewall logs with their Symantec Antivirus logs, that represented a ton of low-level grunt work normalizing the data so that the SOC analysts could make sense of it all. That normalizing task was an intermediate step that provided no value. Google Site Reliability Engineers call that toil. We needed to do normalization to get to the thing that was valuable but the normalization thing itself wasn’t.

The vendor community took a swing at addressing that issue back in the mid-2000s. They started working on something called the Common Event Format (CEF). According to Splunk’s Stephen Watts, it’s “a standardized logging format … designed to simplify the process of logging security-related events and making it easier to integrate logs from different sources into a single system. Today, many vendors use the CEF format but other competing standards have emerged too:

JavaScript Object Notation (JSON)
Windows Event logs
The NCSA Common Log Format (CLF)
The Extended Log Format (ELF)
The W3C Extended Log File Format
The Microsoft IIS (Internet Information Server)

The logging landscape is still a bit of the Tower of Babel if you get my drift. The vendors can’t seem to agree on what log files should look like and so, SOC analysts still execute a lot of toil to normalize the data. It’s all in one spot and the administrative burden is lower than it was back in the 1990s, but SOC analysts are still sifting through multiple piles of data haystacks looking for needles and they spend a lot of time making the haystacks look the same.

Why is logging a Prequel to XDR you might ask? Well, SOC analysts sifting through reams of machine generated log files looking for bad guys has been the standard operating measure since the 2000s. When XDR tools hit the market in 2018, the tool gave the infosec profession a chance to upgrade that process.

Anti-virus/EDR: The other prequel to XDR.

In 1987, a German hacker and computer security expert, Bernd Fix, wrote software designed to remove the Vienna virus, thus becoming the first documented author of antivirus software ever written. Soon after, the notorious John McAfee created the first anti-virus commercial product, VirusScan, and the infosec profession gained a must-have tool for the security stack. By the late 1990s, if you had any budget at all, your security stack had a firewall and an intrusion detection system at the network level and at least one anti-virus system deployed on every endpoint. When I was working in the Pentagon in the early 2000s, we had two deployed on each endpoint because we didn’t trust just one to get the job done.

The idea behind anti-virus systems was that the vendors could write signatures for known viruses and malware designed to detect their deployment. Once detected, the engine could remove it or render it benign. It was a constant battle to get the latest signatures deployed in a timely manner.

In the late 2000s, a new technology emerged that looked at endpoint behavior to detect malicious code. Instead of using signatures of known malware behavior, the engine looked at the entire operating system looking for anomalies. If the endpoint started communicating with servers in Tajikistan when it previously never did before, that might be an indicator that something was amiss. This model allowed the system to detect previously unknown malicious code.Gartner’s Anton Chuvakin coined the technology “Endpoint Threat Detection and Response” (ETDR) in 2013. Now we all just call it EDR (Endpoint Detection and Response). According to Crowdstrike, EDR acts like your old TV’s DVR, “recording relevant activity to catch incidents that evaded prevention.”

While EDR was an innovative and disruptive technology, it was limited because it only dealt with the endpoint on the adversary attack campaign. It didn’t see the entire picture. The Lockheed Martin research team had just published their now famous intrusion kill chain paper in 2010 and the infosec profession was just starting to get their head around the idea that bad guys had to navigate the entire kill chain undetected and unstopped in order to be successful. EDR was just one piece they could use on the kill chain. To have control and visibility on the entire kill chain, SOC Analysts dumped the alerts from their EDR engines as well as all the other network tools in the security stack into their SIEM tools.

I just want to take a moment here and remind everybody that the primary technology working in the background for all of this infosec activity for the past 30 years is logging. I say that because the next innovation , XDR, is going to introduce a better way.

XDR: The better way.

In 2018, I was sitting in the big keynote room at the annual Palo Alto Networks customer conference when Nir Zuk, the Palo Alto Networks’ founder and CTO, took the stage and introduced a brand new product: XDR. He explained that in the past several years, the company had demonstrated success using machine learning algorithms to detect previously unknown malware to a high degree of accuracy and precision. He suggested that it was possible to extend that idea to not only files stored on the endpoint, but also to behavioral data from the EDR systems and from network data stored on security stack tools or really any device on the network.

The idea was to collect all the relevant data intelligence into a giant data lake, run machine learning algorithms on it, and potentially discover previously undetected bad guys inside the network. But, it was inefficient to collect that intelligence via general purpose log files; configuring each system to send everything it has in terms of log files to the data lake. Instead, he suggested, it would be much better if a new security tool, XDR, connected to each security tool via an API and collected the exact intelligence it needs.

Brilliant!

But here’s the best part. Since XDR uses APIs to connect to systems to collect data, it can just as easily send information the other direction. Infosec professionals could use XDR to send updated configuration information back to the security stack.

For example, let’s say that the XDR tool has detected elements of the attack campaign run by the hacker group Magic Hound. According to Tidal Cyber (a startup that I advise), Magic Hound is an Iranian-sponsored threat group that conducts long term, resource-intensive cyber espionage operations likely on behalf of the Islamic Revolutionary Guard Corps. The Tidal Cyber intelligence team says that the attack campaign uses 75 different techniques across the intrusion kill chain. Our SOC analysts can use the XDR engine to query all the tools in the security stack to determine if we have any prevention or detection rules already in place for those 75 techniques.If not, we can easily push an update through the XDR engine to do so. And that’s the power of an API. You can’t do that with a SIEM tool.

Regular listeners to this podcast know that I'm a big believer in first principle thinking. When we published our book back in 2023, “Cybersecurity First Principles,” we said that if you assume I got the absolute cybersecurity first principle right (Reduce the probability of material impact due to a cyber event in the next 3-5 years), that there are logical follow-on strategies that you might pursue to achieve it. One of them is automation; something the infosec profession is not that good at today. But, with XDR, that might begin to change. Using what we did for Magic Hound as the template, we could duplicate those efforts for every known attack campaign in the Mitre ATT&CK wiki; some 150 the last time I counted. And if we could do that, that gets us a long way down the journey for one of the other potential follow-on strategies: Intrusion Kill Chain Prevention.

The trough of disillusionment.

There are a few reasons that Gartner has XDR careening down the Peak of Inflated Expectations toward the Trough of Disillusionment. First, current vendor solutions mostly only work with their own products. The promise of using APIs hasn’t met Nir’s vision yet of connecting to everything.

Second, remember the problem that SIEMs have with vendors not agreeing on a standard logging format? Well, that hasn’t gone away.Every commercial security stack tool has their own unique format to store and transmit data. It’s probably the main reason that current XDR tools don’t easily connect to other vendor products. But in 2022, AWS and Splunk co-founded the Open Cybersecurity Schema Framework (OCSF) project and most of the vendors that have an XDR product have signed on. Only time will tell but that’s a positive step.

Third, the way that XDR solutions have evolved, they are not offering permanent data lake storage like a SIEM tool would. The idea is to collect the data, find bad guys, and then discard the data. If you’re looking to perform long-term analysis over a couple of years, current XDRs can’t help you there yet. Today, you’re still going to need a SIEM to do that. But that’s a minor tweak to some future version of XDR. I think it could still happen.

Lastly, machine learning algorithms running against a giant lake of XDR data haven't really performed as I described above. It’s been five years since Nir’s announcement of XDR and, as far as I know, none of the current vendor offerings have found the Magic Hound attack campaign, or any others for that matter. They currently just find anomalies; more things for the SOC analysts to run down; more hay for them to sift through to find the needle.Today, if your SOC analysts are already overwhelmed, this is probably not useful to you.

The future of XDR.

I agree with Gartner’s forecast that it's probably five to ten years until XDR reaches the plateau of productivity. Whether product managers fix the limitations described above is anybody’s guess. But, if you’re like me, you see the potential for this new paradigm of using APIs to collect data from the security stack vs sending log files from the security stack to a SIEM somewhere. The collection part is not the single most important reason to adopt the technology but because APIs allow the ability to send configuration updates back to the security stack, that is a huge reason to adopt it.

And, we can help push the vendors in the right direction. First, encourage all your security stack vendors to join the Open Cybersecurity Schema Framework (OCSF) project. In fact, make it a requirement for the next contract renewal. This just helps everybody. Second, encourage your XDR vendors to upscale what they are looking to detect. We don’t need another alert about something that may or may not be important (more hay). We need them looking for attack campaigns like Magic Hound, suggesting prevention controls designed to prevent Magic Hound’s success across the kill chain, and providing an easy button to send those prevention controls to the security stack.

That’s what I want in an XDR tool. I think you should too.