Metrics and risk: all models are wrong, some are useful.
Rick Howard: [00:00:00] August 1, 1997, West Point, N.Y. - one of those sweltering New York state days. I was at the data center for the U.S. Military Academy. When I first discovered the Windows operating system program called System Monitor, I was a major in the United States Army. One of my duties was that I supervised the administration and maintenance of an Exchange server farm on an Army base of roughly 10,000 people. I had about fifteen email servers that provided services to all personnel on the base. The System Monitor program was so powerful that you could slice and dice thousands of different variables that had something to do with how the Exchange mail system worked. Unfortunately, Microsoft was fairly silent about which variables were important and which were not. So one day, for no apparent reason, I decided to turn on all of the Exchange variables.
Rick Howard: [00:00:54] I figured I would watch the output for a while and try to determine which variables produced interesting signal compared to the noise that the other variables produced. And then I executed the typical Howard move. I got distracted.
Rick Howard: [00:01:08] Something came up, and I walked away from the server farm without turning off all of that collection. The System Monitor program dutifully wrote all of that collected data to the local Exchange server's hard drive space. Now, fast forward to the wee hours of the next morning. I get a call from the data center admins to inform me that the Exchange server system had completely failed.
Rick Howard: [00:01:32] Apparently, my System Monitor efforts had filled up all of the Exchange server hard drive space and crashed the entire system. I know - it was awful. It took us ten days to get it up and running again. And for those of you that don't know, Army bases are kind of like small towns. Everyone knows everyone. For that ten days, I was the most hated man on the base. Nobody could get any work done all because I thought metrics were a cool thing to play with.
Rick Howard: [00:02:11] My name is Rick Howard. And you are listening to "CSO Perspectives," my podcast about the ideas, strategies and technologies that senior security executives wrestle with on a daily basis. This episode, I get to talk about something that took me a long time to learn, that risk is a concept that most business leaders understand and that security risk is no different than other kinds of business risk that executives deal with every day. And the big secret is that people like me, network defenders, can present cybersecurity risk to leaders in a way they can understand.
Rick Howard: [00:02:53] Many months after my Army email system failure, long after I wasn't too embarrassed to show my face in public again on-post, I discovered a new fact about email. People hate it. Who knew?
Rick Howard: [00:03:07] After it was all said and done and blame had been properly distributed, colonels and privates alike came up to me secretly to thank me for that ten days of bliss. They all said that they got more things done during that ten days without the evil email system running than they had the entire rest of the year. I'm going to call that a win. When you think about it, I was kind of a hero, at least for those ten days. I probably should have gotten a medal for that or something. At least that's how I plan to remember it.
Rick Howard: [00:03:37] The thing is, I love metrics - always have. I've been a fan of collecting metrics since I started in the IT business back in the internet dinosaur days. I didn't have some compelling scientific reason to collect them. I just had this vague unease that I felt blind, running my IT systems without having some indicators to see if my systems were healthy or not. Back in the day, this wasn't an exact science. But even now, it's still a bit of a mystery.
Rick Howard: [00:04:05] When freelance security writer Mary Pratt wrote an essay for CSO magazine not too long ago, claiming that she knew exactly which cybersecurity metrics mattered and which ones did not, I was intrigued. Perhaps the science had improved enough to know precisely what to collect and monitor. In reading through her essay, Mary did pick some good metrics to pay attention to, like simulated phishing attack results and meantime to recover from a cyberattack and meantime to detect a cyberattack, penetration testing successes, vulnerability management patching status and, finally, enterprise security audits against a standard security framework.
Rick Howard: [00:04:44] These are all good things to track. I would add one more. I got it from an old boss of mine, and I like it because it was not your typical cybersecurity metric. He tasked me to show how many people I was using to respond to cyber incidents. Thinking about that for a second, I remember that moment clearly. We were in the employee cafeteria one cold December morning, getting a cup of coffee. And he pulled me aside and said if the number of people was going to go up every year, that I was going in the wrong direction. He said that, instead, I should be automating my processes as much as possible in order to reduce the number of needed people, that the correct solution was not to throw more bodies at the problem. And he was right.
Rick Howard: [00:05:26] But even with those words of wisdom, counting these kinds of things is not the goal. They are a means to a goal. Like I said, they are indicators. But what are they indicators for? Well, in a tactical sense, they are indicators for the health and efficiency of the system. But as Mary pointed out in her piece by highlighting discussions with various company executives, the tactical stuff is for the CISOs, not for the company leadership team and definitely not for the board members.
Rick Howard: [00:05:54] Why would board members care what the meantime to recovery is? How would they even know what good is when they saw it? At best, they would recognize that improving that time each quarter is forward progress. But how would they judge when it was good enough? And besides, this kind of thing is not in their world. You know what is? Risk. Senior executives juggle risk all the time. It's kind of in their job description, and they do it daily across an enormous set of disciplines like personnel management, supply chain, product management, marketing and investment opportunities. That's just a couple of them.
Rick Howard: [00:06:29] But sometime in the early days of internet security, say, like, the late 1990s, the network defender community decided that cybersecurity risk was too hard to convey to senior leadership. We decided that it was much easier to use fear, uncertainty, and doubt, or FUD, to scare the hell out of decision-makers in order to get the budget for our pet security projects.
Rick Howard: [00:06:56] I admit it - I did this myself in the early days. I used these charts called heat maps where I plotted all the cyber bad things that are likely to happen to the company on the x-axis and how impactful they would be if they did happen on the y-axis. The really bad things would float high and to the right of the chart; the more benign things would float low and to the left. And since I knew my way around the spreadsheet, I would color-code the entries. The high and to the right stuff would be red. The middle stuff would be yellow. And the benign stuff would be green. The heat map looked like I was warming up a part of cyber risk from left to right and from bottom to top. And once I brought that concoction to a boil, I would walk it into the next board meeting, point to the highest peak on the heat map, say scary things about what the highest point meant and then ask for a gazillion dollars to fund my pet security project.
Rick Howard: [00:07:46] And just like Robert Shaw says in one of my favorite scenes from one of my favorite movies, playing Captain Quint in "Jaws."
0:07:53:(SOUNDBITE OF FILM, "JAWS")
Robert Shaw: [00:07:54] (As Quint) Sharks come cruising, so we formed ourselves into tight groups. You know, it's kind of like old squares in a battle like you see in a calendar, like the Battle of Waterloo. And the idea was, shark comes to the nearest man. And that man, he'd start pounding and hollering and screaming. And sometimes the shark would go away; sometimes he wouldn't go away.
Rick Howard: [00:08:17] Sometimes my heat map shenanigans work for me, and sometimes they didn't work for me. Over time, I realized my batting average for this approach was not too high. It wasn't until years later that I learned that there are two big problems when you use heat maps to convey risk to senior leadership. First, experts have produced reams of research that show that heat maps are just bad science. There are not just one or two papers on the subject; there are tons. And the reasons cited for this bad science conclusion are plentiful. Mostly, though, heat maps are not precise enough. They are based on qualitative assessments using some version of a high, medium, low score. And one problem with this qualitative assessment is that even if I define precisely what a high score is, to my audience, it doesn't matter. The person seeing the data will have their own opinion of what exactly high means, regardless of what I tell them. When they see the score, they will think it means something different than what I intended the score to mean.
Rick Howard: [00:09:19] Second - and this is probably the bigger problem - heat maps never give decision-makers a chance to judge whether or not the risk is acceptable to them. They highlight scary things, for sure, but give no means to judge the risk appetite of the company. Now, I know this is a little bit hard for network defenders to swallow. We normally assume that any cyber risk is bad and needs to be eliminated. You know, that just isn't true. Company leadership assesses all kinds of risk while running their businesses. They are continually weighing the pros and cons of various actions. Cyber risk is not different from all these other kinds of risks. Cyber risk is just another risk that leadership needs to weigh.
Rick Howard: [00:10:00] Over time, I've come to realize that in order to convey any risk, but especially cyber risk, you must know three elements. The first is probability. What is the probability that a cyber event will happen? This is quantitative, a math number between zero and 100, not a high, medium or low guess that I derive from some combination of Kentucky windage, coin flipage (ph) and a glance at the tea leaves. It needs to be precise.
Rick Howard: [00:10:26] Second, whatever the cyber event is that we are measuring, it must impact the business materially. Not everything in Cyberland will meet this criteria. If the company's website is defaced, I'll just re-image the server, relaunch it - no harm, no foul. But if my competitors steal my company's proprietary code library, that would be a significant emotional event for me, for the CEO and for the board. That would be material.
Rick Howard: [00:10:53] Third, it can't be that there is a high probability for a material cyber event sometime in the future. Of course, that is true. If you wait long enough, something is bound to happen. The probability for that is almost 100%. But if you time-bound the question to, say, three years or five or whatever makes sense for your organization, that probability will likely be much lower. So there has to be a time component.
Rick Howard: [00:11:21] The question we need to answer for the board then is this - what's the probability that a cyber event will materially impact the company in the next three years? Answer that question and then the board members can decide if they are comfortable with that level of risk or if they need to invest in people, process or technology to reduce it. So how do you do that? How do you measure risk with any kind of precision so that you can assign a probability to it? This is where most security professionals fall off the cliff.
0:11:50:(SOUNDBITE OF SCREAMING)
Rick Howard: [00:11:53] And face it. Most of us network defenders didn't get into cybersecurity because we were good at math. I know I wasn't. If I was good at math, I would be making the big bucks working for some engineering firm. And I hate to generalize here, but I'm willing to bet that most of us barely got through our entry-level probability and stats course back in college. The one thing we likely remember from that misery was that to get a probability, we needed to count the number of things that we are interested in and divide them by the total number of things in the set. And yes, that's one way to think about probability.
Rick Howard: [00:12:26] But another way, a more useful way in cybersecurity, is to think about probability as a measure of how certain you are about a specific outcome. The guy that invented this line of thinking, the father of decision analysis, is Professor Ronald Howard. Not this Ronald Howard.
0:12:43:(SOUNDBITE OF TV SHOW, "THE ANDY GRIFFITH SHOW")
Ron Howard: [00:12:43] (As Opie) Now, Sheriff Taylor, did you not say that Post Toasties have the corn flakes, crackle and the fresh corn flavor?
Rick Howard: [00:12:50] But this one.
Ronald Howard: [00:12:51] Hi, I'm Ron Howard. I've been a professor at Stanford School of Engineering since 1965.
Rick Howard: [00:12:56] He says that we shouldn't think about uncertainty as the lack of knowledge but instead think of uncertainty as a very detailed description of exactly what we know. That is where the metrics come in.
Rick Howard: [00:13:08] Let's start with a list that Mary presented in her essay. They're as good as any to begin with. By themselves, though, none give us the answer. They are tactical indicators of the system's health. But here are some stats that you might show your infosec team from last quarter. From our phishing exercise, 5% of the employees click the link compared to 7% from last quarter. That's better. For our mean time to recover from a cyberattack, we went from two weeks the same time last year to just four days this year. That's really good. For our mean time to detect a cyberattack, we went from 382 days last year to just 100 days this year. You can tell those practice drills are really paying off. For our pen test this quarter, the contractor was able to steal the CEO's credentials. Ouch, that hurts. For our vulnerability management posture, our systems are 80% patched compared to only 37% patched the same time last year - way better. And for the NIST Framework, we are at level two on most NIST Framework elements versus level one of the same elements from last year. That is forward progress. And finally, for our incident response team, we have five people on the team this year. That was the same as it was last year.
Rick Howard: [00:14:23] So you have the infosec team in the conference room and you give them those numbers. They would be clearly most pleased with the improvement of their internal security program. The theft of the CEO's credentials will cause some anxiety. We should want to fix that issue pretty quickly. Still, the program is generally moving in the right direction. But with all of those indicators, could they answer this three-part risk question? What is the probability that a cyber event will materially impact the company in the next three years?
Rick Howard: [00:14:54] Here's what you tell them. Estimate the probability so that you are 95% confident of your answer. Could they do it? Of course, they could. Using these metrics as a baseline, it is a pretty simple model to estimate your first probability, but your team could do it. And as George Box, the famous British statistician has said, all models are wrong; some are useful. This would be a simple but useful model.
Rick Howard: [00:15:20] You can absolutely go down the rabbit hole, building more complex models using cost projections, Monte Carlo simulations, latency curves and other things that have something to do with math. The show notes will list a set of reference books and papers that will help you do that. But the bottom line is that with this first step, you now have something you can take to the board. You now have a simple but precise estimate of the risk that your organization might be materially hacked in the near future.
Rick Howard: [00:15:46] The key thing about risk and what you might do about it is that it is absolutely tied to the organization's risk culture. Let's say that your infosec team estimated that they were 95% confident that there was a 20% chance of being materially impacted by a cyber event in the next three years. Some listeners hearing this number might say that was unacceptable, that we need to reduce that number by quite a bit. Others would say that 20% compared to other risks that the business is dealing with seems reasonable; I am willing to eat that risk and deal with the consequences if something happens later. Both are correct answers. The point of all this is that now we can give the senior leadership a choice to see if the forecasted probability is within their risk appetite.
Rick Howard: [00:16:34] In my younger days, I used to think that if I failed to convince company leadership to fund my pet security project, that it was because they were too dumb to understand the intricacies of cybersecurity. They were typical nontechies, and they just didn't get it. In hindsight, I was terribly naive. The most likely reason that I failed was probably that I did not do a good job convincing them of the risk. And the second most likely reason was that even if they did believe me, they considered the risk to be acceptable. It took me a long time to learn the significance of the fact that company leaders deal with this kind of thing all the time, that cyber risk is just another risk in the hundreds that executives have to consider as they shepherd their companies toward success.
Rick Howard: [00:17:16] Security executives can help them do that by evaluating and conveying the risks to those company leaders in a way they can understand. The first step is building your model with the security metrics that you already have. Over time, enhance that model with better metrics and better math. Just don't forget to turn off the data collection so that you don't overrun your email system. I still think I should have got a medal for that.
Rick Howard: [00:17:43] That's a wrap. If you agree or disagree with anything I have said, hit me up on LinkedIn or Twitter and we can continue the conversation there. "CSO Perspectives" is edited by John Petrik and executive produced by Peter Kilpe. Sound design and mix by the insanely talented Elliott Peltzman. And I'm Rick Howard. Thanks for listening to "CSO Perspectives." and be sure to learn more about Pro+ content on thecyberwire.com/pro website.