Incident response: a first principle idea.

By Rick Howard

Aug 3, 2020

CSO Perspectives is a weekly column and podcast where Rick Howard discusses the ideas, strategies and technologies that senior cybersecurity executives wrestle with on a daily basis.

Incident response: a first principle idea.

Listen to the podcast episode.

Note: This is the ninth essay in a planned series that discusses the development of a general purpose cybersecurity strategy using the concept of first principles to build a strong and robust infosec program. The other essays are listed here:

Incident response isn't rocket science.

My quick take on the NIST "Computer Security Incident Handling Guide: Special Publication 800-61 Revision 2,” published in August 2012, is that the idea of incident response isn't rocket science. It can be complicated because you have to coordinate things across the entire organization, but the basic idea is simple:

Preparation: Devise a plan on how to respond to cyber issues.

Detection and analysis: Develop cyber-detection capabilities and analysis skills for early detection.

Containment, eradication, and recovery: Once discovered, don’t let adversaries move elsewhere in your network. Destroy their capability to burrow in undetected somewhere else and connect back out. Recover the systems that were affected.

Post Mortem: Review what you did. Make improvements to the plan for the next time.

According to NIST’s “Framework for Improving Critical Infrastructure Cybersecurity,” your incident response program should have all of those things plus a communications plan about how you will convey the right information to your employees internally as well as your customers and stockholders around the world externally.

From a first principle angle though, detection, containment and eradication, and communications are the three key pieces. If we are to reduce the probability of material impact due to a cyber event, we have to accept the fact that, sometimes, the adversaries will come after us. For those instances, we have to detect the intruder’s behavior soon enough so as not give the adversary time to succeed in their task before we can determine the most advantageous plan to thwart them. And then we need to execute that eradication plan flawlessly. But as important as the first two pieces are, the communications plan can make or break the event at the end even if you execute the first two perfectly. How you communicate what happened and what you did can materially affect the value of a company in the commercial space and also potentially severely affect an institution's reputation in the government and academic worlds.

For the first two pieces, the bulk of the work is done by the technical teams. For this third piece, though, you need a task force that cuts across the entire organization from the risk office, the legal office, the marketing and PR office, and the senior leadership team. When the event reaches the senior leadership team, you most likely have an outside public relations firm consulting on the communications plan as well. If the event is serious enough, you might have an outside incident response contracting team come in to help too. Somebody has to manage all of the pieces.

As network defenders develop the plan, they immediately start thinking in terms of stages of escalation. When something suspicious pops up in the SOC, the escalation team that forms is small, mostly within the infosec team. As they collect more evidence of potential bad news, the escalation team expands to the IT teams and to security leadership. If more bad news comes in, other non-technical teams start warming up in the bullpen just in case. The crisis task force forms. If evidence emerges that an actual intruder is operating in the network or has operated in the network, it is time to warn the senior leadership team and call the potential outside contracting teams. If the adversary is successful, the senior leadership team needs to decide how and when to execute the communications plan.

Even though incident response isn't rocket science in concept, executing the incident response plan can get messy quickly. There are a lot of moving parts. And with new people coming into and leaving the organization or changing jobs all the time, the chances that not everybody will be on the same page is high. What you don’t want to happen is decision makers at each escalation stage weighing options that they’ve never considered before, during a crisis when stress is high, and they have no time for reflection. The best way around that is to conduct crisis exercises a couple of times each year—and they don’t have to be that complicated.

Once you develop the plan, bring the stakeholders into a lunch-and-learn conference room. Offer food. That provides the incentive to get people to the meeting who might not want to spend time on a drill. It helps if the CEO is sponsoring the exercise—a little added incentive. Drop your favorite cybersecurity worst-case scenario on the table and facilitate the group’s walkthrough of the plan over lunch. I have done many of these kinds of exercises in my career. Every time I thought I knew how the senior executive team was going to react to a particular twist in the scenario, I was wrong, which is what you want. You adjust the plan based on the exercise and plan for the next exercise down the road. When something happens in the real world, you’re ready. I’m not saying that you’ll execute the plan as practiced. As the Prussian military commander Helmuth von Moltke said back in the 1800s, “No plan of operations reaches with any certainty beyond the first encounter with the enemy's main force.” Or as Mike Tyson more eloquently said, "Everyone has a plan until they get punched in the mouth." I’m just saying that practicing what you might do offers experience that might help decision makers improvise when the actual event happens.

For a good example of how to handle the communication plan well, I point to Zoom. When the pandemic began, everybody on the planet started to use the Zoom video conferencing application to host all of their online meetings. The network defender community expressed serious concerns about the newly discovered security issues in the Zoom product. The CEO took immediate steps and told everybody what he was doing. That was a success story. What seemed like a potential disaster at the beginning of the pandemic is a non-story today. There are still lingering security issues in the Zoom product, but network defenders are, for the most part, giving Zoom a pass because they know Zoom is working on them. That’s how you roll out a crisis to the public.

For an example of how not to do it, I point to the OPM data breach.

The OPM breach and its aftermath: a case study on how not to do incident response.

From 2012 to 2016, the Chinese government used their own Unit 61398 (AKA the Axiom Group, AKA X1) and Deep Panda (AKA Shell Crew, AKA Deputy Dog, AKA X2) to pull off one of the most valuable cyber espionage campaigns in modern times. These two groups successfully exfiltrated 5.6 million electronic fingerprint records as well as personnel files of 4.2 million former and current government employees along with security clearance background investigation information on 21.5 million individuals. And this cache wasn’t just names and Social Security Numbers either. Besides the fingerprints, the Chinese government got their hands on the SF-86 forms. These are the forms that government employees fill out to get their secret clearances. They are required to record everything about their personal lives for the past 10 years; where they lived, who their friends and neighbors were, who they worked for, the citizenships of all their relatives and housemates, foreign contacts and financial interests, foreign travel, psychological and emotional health, illegal drug use, and many other matters. The impact is that the Chinese government has some kind of leverage on every single U.S. government employee and will have it until employees age out of government service some 50-75 years hence. The U.S. House of Representatives Committee on Oversight and Government Reform report on the OPM data breach quotes former CIA director Michael Hayden saying this:

"[OPM data] remains a treasure trove of information that is available to the Chinese until the people represented by the information age off. There's no fixing it. "

If you’re looking to get your blood moving this weekend, take an hour and thumb through the Congressional Oversight report on the OPM breach. It made me mad.

In terms of first principle cybersecurity thinking before the incident, OPM failed at every philosophical point. They had no concept of reducing the probability of material impact to their organization and the government at large. What I mean by that is the OPM leadership was in charge of protecting the crown jewels of all government employees: their very sensitive personally identifiable information (PII). Stolen government employee PII is even more impactful than stolen commercial or academic PII because it could potentially be used by foreign entities as weapons to influence the political landscape at a global scale.

By all accounts, OPM leadership didn’t accept that responsibility, didn’t treat that information any differently than anything else on their network, and didn’t know that they should. Despite constant urging from the inspector general as far back as 2005—seven years before the first Chinese penetration—OPM had deployed no zero-trust measures. In fact, they had no security stack deployed at all, for the most part. Like other security organizations in government entities, they applied few resources to improving their security posture over the years let alone attempting to track known adversaries across the intrusion kill chain.

Then, when they finally noticed the penetration two years after the Chinese had successfully broken in, OPM had no incident response game plan to execute. OPM leadership up and down the chain—from the director of IT security operations, to the CIO, to the OPM director—decided it was better to conceal information or downplay its importance to other key players like the inspector general and the House oversight committee.

When OPM finally discovered the evidence that X1 might be in their network, OPM leadership made the classic mistake of choosing to collect more intelligence, to watch the adversary, rather than kicking it out of the network. What?! The maturity of their infosec program was wanting, to say the least, and the leadership decided they were smart enough to watch an adversary and do nothing to kick them out before they do damage? I am dumbfounded.

They assumed that X1 was the single point of entry when, in reality, X2 was already inside another part of the network undetected, and the Chinese had also infiltrated not one but two of their third-party supply chain contractors. While OPM was gathering intel, the Chinese were scooping up every bit of PII in the U.S. government. In the end, the OPM director, CIO, and director of IT security operations all were fired or forced to retire.

The Congressional report on the breach had many suggestions for improvement. I don’t disagree with any of them but, from my perspective, everything OPM leadership did wrong before the breach and during can be boiled down to the atomic fact that they weren’t thinking in terms of cybersecurity first principles. Our goal as network defenders is to reduce the probability of material impact to our organization due to a cyber event using a combination of these eight strategies:

Zero trust
Intrusion kill chains
Resilience
DevSecOps
Risk
Cyber threat intelligence
Security operations centers
Incident response

Reading through the Congressional report on the breach, it is clear that OPM’s leadership not only didn’t implement any of them before the breach, but during the breach, most of their decisions devolved to protecting their jobs and not protecting their organization. This isn't how to do incident response.

The beginnings of incident response.

You could make an argument that the same precipitating event that caused the creation of the first modern-day security operations centers, the Morris Worm, also caused the need to build incident response teams.

It was the early days of the internet; no AOL, no World Wide Web, no always-on internet connection at your house. If you wanted to connect, you most likely drove into the office at your university or your military base. If you connected from home, you used a dial-up modem over your existing phone line to make the connection to one of the only 60,000 computers on the internet at the time. By contrast, some experts estimate the number of internet-connected devices will reach 75 billion by 2025. In other words, the internet wasn’t a thing yet for the masses, but it was vitally important for government and research institutions.

At the witching hour on 3 November 1988, I was working late in my navy-housing apartment trying to get a program working for my data structures class at the Naval Postgraduate school in Monterey California. The deadline for the assignment was just three hours away, but I couldn’t get my 2400 baud modem to connect to the university’s modem bank, and I was starting to panic. Little did I know that, just after midnight, a 23-year-old Cornell University graduate student named Robert Tappan Morris would bring the internet to its knees. He had launched the first ever Internet worm, and for at least some days after, the internet ceased to function as UNIX wizards of all stripes across the globe worked to eradicate the worm from their systems.

As I mentioned in the Security Operations Centers essay, the Morris Worm caused DARPA (the Defense Advanced Research Projects Agency, a science and technology organization of the U.S. Department of Defense) to sponsor Carnegie Mellon University to establish the first CERT/CC (Computer Emergency Response Team/Coordination Center) to manage future cybersecurity emergencies. But it also sparked a discussion in the newly forming network defender space about how to respond to a cyber incident within your organization. At the Naval Postgraduate School, where I was during the event, the response consisted of faculty members who could spell UNIX correctly three times out of five running around the hallways with their hair on fire shouting esoteric computer slang at each other like sendmail, rsh attacks, telnet, and finger. Perhaps there might be a better way.

Enter my all-time computer science hero, Doctor Clifford Stoll. Really, if there were baseball cards for computer science giants, my collection would include Grace Hopper, Alan Turing, and multiple copies of Doctor Stoll. His book, “The Cuckoo's Egg,” was one of the first, and still one of the most influential, cybersecurity books ever published. One of the reason’s his book remains influential over 30 years later is that he almost single handedly invented incident response and the techniques he developed haven’t changed that much.

Doctor Stoll was an astronomer at the University of California at Berkeley in 1986, not a security guy by any means. But he was asked to help out in a UNIX lab on campus and track down an accounting error in the student body computer records. Back then, universities charged their students for computer time, and each month, the sum of the accounting records for all the Berkeley student computer users was off by 75 cents. His investigation to fix the error led to the discovery of the first public cyber espionage campaign run by the Russians using East German hacker mercenaries to break into U.S. university systems in order to break into U.S. military systems. Back then, we didn’t really have any security per se. The internet was basically connected with strings and cans.

Because of his astronomer background, he treated the entire exercise like a science experiment. He developed hypotheses, built experiments to test his hypotheses, and wrote everything down in a logbook. In 1988, he published the paper from his logbook in the journal Communications of the ACM, which eventually turned into the book he published in 1989. If you haven’t read this book yet, stop what you are doing right now, and get it done. Doctor Stoll is, how would you say it, eccentric. His kookiness pervades the entire book, and his joy for life is palpable. Even if you aren’t a techie, you’ll love it. I promise, you will be delighted, and in the process, you will witness the birth of incident response as a network defender best practice.

Incident response: a first principle idea.

Incident response isn't rocket science.

The OPM breach and its aftermath: a case study on how not to do incident response.

The beginnings of incident response.

Recommended reading.