Two risk forecasting data scientists, and Rick, walk into a bar.
Rick Howard: Hey, everybody. Rick here. I know I said at the end of the last episode that I was concluding the series on risk forecasting. But during the research for the series, I interviewed two data scientists from a company called Cyentia - Dr. Wade Baker, one of the founders, and David Severski, one of the data scientists there. Cyentia is a research organization that uses data analytics to solve problems for their clients, and I completely stumbled upon two research papers they published about calculating cyber risk for Fortune 1000 companies, nonprofits, and just any kind of organization in the United States.
Rick Howard: In 2020, they published "Information Risk Insights Study: A Clearer Vision for Assessing the Risk of Cyber Incidents." And this year, 2022, they published "IRIS Risk Retina: Data for Cyber Risk Quantification." I have put links to these two papers in the show notes if you want to read them, and I highly recommend that you do. I mean, it was a breath of fresh air. Their methodology completely matched what I had been rambling on about for these last three episodes. I used a few snippets from that interview in the last show, but my discussion with these two guys was so fascinating that I thought I'd share it with the "CSO Perspectives" audience. So this episode is called "Two Risk Forecasting Data Scientists and Rick Walk Into a Bar." Enjoy.
Rick Howard: My name is Rick Howard, and I'm broadcasting from the CyberWire's secret sanctum sanctorum studios, located underwater somewhere along the Patapsco River near Baltimore Harbor, Md., in the good old U.S. of A. And you're listening to "CSO Perspectives," my podcast about the ideas, strategies and technologies that senior security executives wrestle with on a daily basis. I'm joined by Dr. Wade Baker...
Wade Baker: Wade Baker, one of the co-founders of Cyentia Institute, and I spend a lot of my time on the data storytelling side of all that we do.
Rick Howard: ...And David Severski.
David Severski: I'm David Severski. I'm a data scientist here at the Cyentia Institute, and most of my job is responding to, wait, just one more question, questions.
Rick Howard: (Laughter) I have bosses like that too. All right...
Rick Howard: Right off the bat, let's get the pronunciation of the company out of the way because the way it's spelled could have multiple pronunciations.
Wade Baker: Cyentia. It's a little bit weird, yeah. We have a shirt that has various pronunciations in it, and the bottom of it just says, however you say it, it means good research, so we just go with it. Cyentia Institute is a security, data science and research firm. I mean, we very much focus on that. We work with a lot of vendors who, you know, in their products and services, have amazing visibility into various aspects of security and collect a ton of data. And they're usually coming to us from a marketing standpoint. You know, they want to publish some thought leadership piece and engage in activities like that. We analyze that data, and our job is telling that story around the data in a way that is rigorous, true to what the data actually says.
Rick Howard: Yeah, and that's the reason I ran into some of those research projects and just enthralled by it. And just so you guys know, this podcast, "CSO Perspectives," we've been doing it now for about two years. But throughout, since the beginning, the main theme of the show is to consider cybersecurity through a lens of first principles. And I've made the case in this show that the absolute first principle that all security practitioners should adhere to, the thing that we should be all pursuing above all other things, is this - I'll just lay it out. It's to reduce the probability of material impact to our organizations due to a cyberattack. And we can spend hours and hours debating whether or not that is the first principle. I believe it is.
Rick Howard: But you will notice that the word probability is my first principle definition. It's the noun that we're trying to act on. It's the thing we're trying to impact. And for most of us, though, people like me - you know, security practitioners - we have huge difficulties with probabilities, just even understanding it, let alone trying to calculate it for our organizations. And we do these inside-out calculations where we try to count all the things in our digital environments and try to make some assessment about how secure they are. And most of us just throw our hands in the air, and we fall back to these crazy, crappy qualitative heat maps with imprecise high, medium and low attributions, right? And that's kind of the state of the art where most CISOs - how they calculate risk in their organizations.
Wade Baker: I like your definition. You have both the key aspects of risk in there - probability and impact. And I do think, further to that point, that most organizations are most concerned with material impacts. You know, we've kind of gotten to this point where, yeah, we're going to have incidents. Everybody does. We clean them up. We go on about our business. But most organizations are worried about the ones that result in major disclosures or disruptions that impact the bottom line.
Wade Baker: And I think also another point you made that I think is spot on is I think a lot of people over the years have gotten lost when they try to quantify risk by sort of doing it at the atomic level. I - in the early days, when I started doing research here, I was influenced by Donn Parker. He is passed on now - but, you know, one of the pioneers in the industry.
Rick Howard: I was just reading his book this weekend. I forget - something about cybercrime.
Wade Baker: Yep.
Rick Howard: He was he was complaining about the CIA triad. He didn't like it.
Wade Baker: Right.
Rick Howard: For those who don't know, Donn Parker was one of the early cybersecurity thought leaders. In 2001, the Association for Computing Machinery, ACM, selected him as a fellow. But in 1998, he published a book, "Fighting Computer Crime - A New Framework for Protecting Information," where he strongly condemns the elements of the CIA triad as being inadequate. Confidentiality, integrity and availability have been early candidates for a truly cybersecurity-first principle, meaning that the security for any organization was this three-legged stool.
Rick Howard: The idea most likely originated from a 1975 paper published by Jerome Salter and Michael Schroeder called "The Protection of Information in Computer Systems" in the Proceedings of the IEEE. And the security community even today have adopted that notion as a truism. But Parker was having none of that. He proposed adding three other elements - possession, or control, authenticity and utility - that eventually became known as the Parkerian Hexad. But the idea never really caught on for reasons probably only a marketing expert could explain.
Wade Baker: Right.
David Severski: He's got the Hexad.
Wade Baker: And he came up with the Parkerian Hexad.
Rick Howard: Yeah.
Wade Baker: And he did a lot of interviews with cybercriminals and other things like that, but he just railed against any attempts to take a risk perspective in cybersecurity. You know, he very much viewed it as, nope, you just have a checklist, and you do these things, and that's the way you handle it. And I actually had a chance to go back and forth with him over several years on those things. But I think, in his mind, he was thinking about doing it at the atomic level. You know, I take all the little pieces and parts that are operating and try to quantify all that stuff.
Wade Baker: And there's a million different ones, and they're constantly changing, and you can never get your hands around it. And I think that's where a lot of it goes wrong. And so yeah, when we came to this, the ability to, all right, let's forget all that. Let's just look at a bunch of incidents that have happened in the past and see what we can learn from them, because we know those things. They have happened.
Rick Howard: That's exactly what I've been saying here at the "CSO Perspectives" podcast, that you can get a decent Bayesian first prior by doing some basic outside-in forecasts. And in 2020, your paper called "Information Risk Insights Study - A Clearer Vision for Assessing the Risk of Cyber Incidents" does exactly that. The paper covers Fortune 1000 companies since about 2013, and the amazing finding is that 23% of them get hit with a breach each year.
David Severski: I think it's interesting to note that IRIS was a departure from some of the types of research we have done in the past. A lot of our research up until that point had been focused upon controls. And that is an important and very useful area of research - looking at things such as vulnerability management, software vulnerabilities, etc. But it's only part of the equation. The other part is what happens when those controls fail, as they inevitably do? Complex, human-built systems always have failures out there. And what happens when those things fail? And to do that, you have to look at loss information, which is probably not a surprise to anyone listening here. That information is hard to come by.
David Severski: So we were delighted to partner with Advisen, now part of Zywave, for their commercial cyber data loss breach feed, which is in my experience, the largest, most comprehensive source of verifiable, publicly identifiable breaches. They're collecting information from Freedom of Information Acts, through news queries, through looking at state departments, attorneys general, etc., out there to say, how much public information can we find about breaches, and how much of that information can we verify? It is - certainly, as I mentioned, it's not complete, but it is much more complete than anything else I've seen out there.
Wade Baker: And used heavily in the insurance community - you know, that's where they have most exposure, in that sort of actuarial data set, if you will.
Rick Howard: David, when you were talking about controls earlier, that's somebody like me looking at my internal security posture from the inside out. I have a corporate headquarters in Cupertino. I have sales offices in Australia, Canada and France. I have deployments in Microsoft Azure. And I run X, Y and Z tools in the security stack. By looking at that deployment, I can make an inside-out assessment that the material risk to my business this year is, say, 20%, or a 1-in-5 chance.
Rick Howard: But what you guys did in this paper is to go from the outside in, from the general case. What's the probability of any organization being materially impacted by a cyber event this year? And I think that's the better way to start. You start with the outside-in forecast and that becomes our first Bayesian prior. In your paper, you said 23% of Fortune 1000 companies are materially impacted every year. That's the first assessment. And then you can go back and do the inside-out forecast and say, well, we have all these controls, and we're kind of weak over here but strong over there, and I can adjust the assessment up and down based on that. But first, let's get the outside-in forecast.
David Severski: And that's exactly the type of purpose that we hope the people take away from something like IRIS and the subsequent papers that we've done in the IRIS series and the IRIS Risk Retina series as well is, our intention is not to answer, what is my risk exactly for all organizations? That's not possible.
Rick Howard: Right.
David Severski: We don't have that much information about every single organization, but we can say, here's what the baseline is for an industry, for a group such as the Fortune 1000. Armed with that information, a risk manager and a risk practitioner can say, OK, I know how my organization sits relatively high or more challenged or are more - better prepared than some of my peers. I can start with that information and then adjust it up or down based upon the situation I'm dealing with, my organization, etc., so they can spend more time doing risk management and less time doing the grunt work of data collection. In the case of the Fortune 1000 you were mentioning there, I mean, that was a natural population for us to study, not only because it is very large and people know what that means, but we actually know what all the firms are throughout a given period. We know there's a thousand of them. We can get the list of what those companies are. And so when we see an event that happens over a given period of time, we know whether or not that company had one or more events. So we have a very contained measurement of population that we can do for these type of calculations, like the 23% calculation.
Rick Howard: The other analysis you did in that paper was that the bigger you are in the Fortune 1000 set, the higher rate is that you might get hit. For example, for the top 250 firms, you are five times as likely to get breached compared to the bottom 250 firms. Was that a surprise to you guys when you saw that? 'Cause it surprised - it seemed like such a stark difference.
David Severski: It certainly surprised me. When I thought about the Fortune 1000 going into that with a naive perspective, I said, well, these are the largest of large firms there, and I didn't expect to see...
Rick Howard: Yeah.
David Severski: ...Quite that much difference. But it's something to...
Rick Howard: Yeah, I thought they were going to be all the same. I figured they'd be all - right? But who knew, right?
David Severski: No, size matters.
Rick Howard: (Laughter) Size matters.
David Severski: Yeah.
Rick Howard: It's true. We have scientific proof (laughter).
Wade Baker: And even more so when you go outside the Fortune 1000, you know - and that report does some of that, where we go down to just the very, very small organizations. And, you know, there's issues with, hey, do they report things at the same rate as a really large organization? Probably not. But, you know, it is interesting to see how those different firmographics do alter the risk posture of an organization.
Wade Baker: Again, just going back to the whole reason why we wanted to undertake this research, I have experienced myself, and have heard countless times, someone has the task to conduct a risk assessment. And the question is always, where do I begin on things like assessing frequency? I don't know how many times ransomware happens - you know, the probability of that. You know, what - who knows?
Rick Howard: Right.
Wade Baker: So it's just a bunch of guesswork. And that whole garbage in, garbage out thing applies, even though it's very cliche. But our goal was, all right, let's give a starting point. Let's at least - we can do this. You know, there's enough data out there. And that's what we hope that happens. Someone can look at this and say, well, I'll start from there instead of starting from finger-in-the-wind position. And if we do that, then we're better off right from the start, I think.
Rick Howard: I've been one of those naysayers about, you know, when you hear people say, well, different verticals are more impacted by cyberattacks than others. And I say, you know, it's - my gut feeling, after doing this for a long time, was that it was standard across the board. But you guys show that that's not it at all. If your company does administrative work or financial work or IT or management, you are much likely to get hit, compared if you're transportation or agriculture or construction. And I just found that wildly fascinating. Again, is that a surprise when you did that calculation?
David Severski: I think it is and isn't. I mean, there's certainly an element of, wow, when you see it stark like that, it is certainly very impressive. And certainly when you look at things like the public sector there, that really takes us by surprise. But I think it's also important to keep in mind that we are not ascribing cause to many of these breaches...
Rick Howard: Right. Right.
David Severski: ...And that there are many reasons why, say, the information sector and the public services sector - and I think we talked a little bit about this in the report - why they figure so prominently out there, one of which is they tend to be a more regulated industries. And so they have more transparency in terms of what they're reporting on. You take a very highly regulated sector, such as the financial services sector there, they have very strong reporting requirements, et cetera.
Rick Howard: So we know more about them so they show up more? Is that the - is that what we're saying, or...
David Severski: Yes, I think that is certainly part of the case there. You know, they have their large organizations there. And also, you know, in the case of information services and financial services, that's where the money and the data is, you know?
Rick Howard: That's right.
David Severski: That's - they have more to lose.
Rick Howard: Why do you rob banks? Yeah, that's...
David Severski: That's where the money is. Yes.
Rick Howard: ...'Cause that's where it is (laughter).
Wade Baker: Yeah, yeah. I do think, though, there is a ton of overlap in these. You know, so one thing we try to make in the report is, hey, OK, if you're a financial service organization, you don't want to automatically assume that you're worse off than some other one because there's a really wide distribution and a lot of uncertainty. So not every...
Rick Howard: Yeah.
Wade Baker: ...Single financial institution has a higher probability than every single manufacturing firm, right?
Rick Howard: Right. Right.
Wade Baker: There's overlap. And you see some industries - I'm looking at the report now, figure six in the IRIS 2020 - you know, health care and hospitality, right? They're right next to one another, have a similar probability. And you think, why? That's weird. Hotels and hospitals - I wouldn't think they would have a similar probability of incident. But when you look at what's going on there, a lot of their incidents are around the same kinds of things. They both...
Rick Howard: Yeah.
Wade Baker: ...Transact financial transactions. They both collect personal information that is interesting to fraudsters. You know, you can go on and on down that list. And so I think a lot of these industries, to the extent that they have similar business processes, that means that more and more of their probability of an incident will be similar. And it's just super interesting to study those kinds of things.
Rick Howard: So, David, you mentioned loss before - right? - and the amount of loss is interesting. Most of the breaches in the Fortune 1000 group, they cost a victim on average about a million dollars. But you list in your report at least 188 events in the data set that cost more than $10 million. So can you help me understand that number and that discrepancy or that wide gap?
David Severski: Absolutely. It is really the extension of just what Wade was saying - is that it is rare that any sort of data set, particularly in the case of losses, can be represented by a single number. Even though we are very drawn to, you know, give me the bottom line number, the truth is, you know, there's a distribution out there. And so, you know, you see a figure for what is typical - and I'll use air quotes around that - may not be representative of what is average for a particular data set. Yes, a average loss may be around $1 million, but that means - that does not mean that there is not a high proportion of events that are very, very large.
David Severski: And so what we've tried to do in the IRIS series overall is say, yes, we understand the need to have a single number. And so we will present what is a typical number. But like a - you would see in an insurance sector, there is a lot of variation out there for those extreme, meteor-style, you know, world-ending events there. So we typically present in 95th percentile and say, you know, five times out of 100, your loss is going to be even greater than this value. And so there may be some extreme values out there. So don't get too complacent about what the typical size is because there are the outliers, which can hurt you extremely...
Wade Baker: Yeah.
David Severski: ...Severely.
Wade Baker: Yeah, and I think that goes back to your comments about the probability of material impacts...
David Severski: Absolutely.
Wade Baker: ...Which is something we wanted to facilitate the assessment of. OK, yep, typical incident costs you a couple hundred thousand bucks. You know, some organizations, that would be terrible. Others, they're like, who cares?
Rick Howard: Yeah, yeah. That's right.
Wade Baker: It's these way out in the tail that you want to know. And we can actually quantify that and start measuring how often those tail events happen.
Rick Howard: Well, I mean, that's a nice segue to something I was delighted to see in the report. You guys used loss exceedance curves to show stuff. And so for the Fortune 1000 companies, I mean, just give three bullets - there's a 24% chance of losing 1 million in a 12-month period, a 14% chance of losing 10 million but less than 6% chance of losing 100 million. And so, Wade, can you talk about why loss exceedance curves are better at explaining these forecasts than, say, the way I used to do it with heatmaps and high, medium and low scores and things?
Wade Baker: Yeah, risk is a hard concept...
Rick Howard: (Laughter) It is.
Wade Baker: ...Partly because we use that word in so many different ways, and it's hard to collapse that. And I'm not just saying it to, you know, preen your feathers or anything, but the - reduce the probability of material impact I wrote down as your step. And sort of in that statement, you're combining probability and impact, which you see in a classic heatmap are the two major axes, right?
Rick Howard: Yep.
Wade Baker: And it's the combination of those two things that is really hard because is a medium probability and a high impact - what is that - high risk? Is it...
Rick Howard: Yeah.
Wade Baker: ...Medium high? I don't know. What color do we give it?
Rick Howard: And there's reams of science that says...
Wade Baker: Right.
Rick Howard: ...That kind of thing is bad science, right? So yeah.
Wade Baker: Exactly, exactly. And so, you know, that's where loss exceedance curves really come in handy. They combine both the probability and impact or losses side so that you can make statements about, hey; what's the probability of a loss that's this much...
Rick Howard: Yeah.
Wade Baker: ...And answer that question quite clearly. And it's borrowed from the insurance community, which in many ways I think is more cutting edge than than typical engineering-minded security community in answering questions like that because that's how they manage risk across their portfolio of insured organizations. And so that's why I like loss exceedance curves, or LECs. And we really wanted to show that, hey; you can really do this, and it enables these kinds of statements. Don't you think that's powerful? And we've gotten a good response that yes, it is indeed. That's the kind of thing I'd like to see.
Rick Howard: So, David, in the second paper that you guys just recently published - it's called "Data for Cyber Risk Quantification" - it's a similar analysis for nonprofits, not Fortune 1000 companies this time. And the loss exceedance curve there, you know - they're smaller. OK. But they could be devastating - a 7% chance of losing 100K in a 12-month period, a 3% chance of losing 1 million and a 0.7% chance of losing 10 million. And for these companies, the probability, like I said, is much smaller. But if it happens, it could be devastating to them...
David Severski: Absolutely.
Rick Howard: ...Because they have no money to start with, right? So these are black swan events, right? So how do companies prepare for that kind of thing? - because it's not likely to happen. But when it does, we're done. It's like the meteor hitting the Earth - you know, that kind of thing.
David Severski: Yes. And so the answer to that question of, what do we do about that, is a difficult question. I don't have the magic bullet answer to that one there. I'm trying to highlight this...
Rick Howard: If you do, we should start a company. Yeah.
David Severski: We should. But I think it highlights the challenge that, you know, as we were saying, a loss of $100,000 may not be much for a Fortune 1000 firm that is measuring their revenues in, you know, hundreds of millions, if not billions of dollars there. But for these small nonprofits there where they simply don't have the funds to give - and even if they do, those are near and dearly spent. You know, those are quite often the really big sell funds in some cases. And so losing that money has such an impact upon the mission of those organizations.
David Severski: So when making a risk management decision - and I think that's ultimately what it boils down to - what is your risk tolerance? Tools such as a LEC, whether it is something for the Fortune 1000 or for the nonprofits or for any sector of the economy there, helps make an explicit decision to the people that need to make those decisions about how much risk can we tolerate? What is our risk budget? You know, what is material for our organization? And do we spend the additional funds to mitigate a small likelihood outcome that could have a substantial impact, or do we spend that to more operational needs? Because, ultimately, all business decisions are a matter of risk management, whether it's cyber-risk or whether it's business risk or mission risk, in the case of nonprofits. It's all a matter of risk management. So I believe firmly that business leaders can make these decisions and do it every day. It's just a question of are we providing, as practitioners, as CSOs out there, are we providing the information they need to put in a context that they can understand or not? And if we're successful at that - and I think LECs are a great tool for that - I have confidence that most business leaders can rise to that challenge and make well-formed decisions.
Rick Howard: So we're kind of at the end of this. But I want to get your last thoughts on this. What's the big takeaway from these two reports and other things that you guys are doing? Dave, let's start with you. What's the big takeaway here?
David Severski: I think the big takeaway is something that's not unique to Cyentia - is that you have more data than you think. There is data available. Risk quantification can be done. It is a question of not getting bogged down in the details, not going down further into whatever taxonomy you're using than you need to. And tools such as IRIS can help you get information out there.
Rick Howard: Wade, you get the last word.
Wade Baker: All right. I think the takeaway is that this can be done. I - again, I see a lot of people getting discouraged, bogged down in the details, and they just sort of fall back to, ah, we got to make a decision. So I'm just going to go with whatever my gut says or whatever someone who I trust is telling me. And I really would love for people to read these reports and take away that, all right, this is doable. This is helpful. And let's incorporate this into our risk assessments. Because I think it could be a game changer in the way that we make cyber-related decisions and especially, you know, for your audience because I think they're getting challenged on a lot of different ways from the board, from nonsecurity people who typically speak in dollar terms and risk terms and things like that. And I think this is more - this is closer to their language than most other things that I've seen.
Rick Howard: Well, as I told you guys when we were prepping this call, I've read all the books on how to do this, you know, the FAIR books and the how to measure anything in cybersecurity books and all of that. And all those are fantastic primers for how to think about this. But I kept waiting for the chapter, you know, the last chapter in the book and says, OK, now that we taught you all this, here's how to do it. And they're not there, OK. They're not in those books. But when I read your papers, I said, oh, that's how you do it. That's it. That's the first step. So congratulations to you guys.
Wade Baker: Amazing. Thank you.
Rick Howard: I think it's right. Yeah, it's amazing stuff.
Rick Howard: That was Wade Baker, a co-founder and data storyteller at Cyentia, and David Severski, one of the data scientists there. We'll be right back with a wrap-up of this episode.
Rick Howard: As I said at the top of the show, the two reports from Cyentia that we discussed are the manifestation of exactly what I've been talking about with superforecasting, the Bayesian math rule, Fermi estimates and outside-in risk calculations. Do yourself a favor, and read them yourself. If you're anything like me, you'll be amazed. And that's a wrap, not just for this episode but for the entire season - Season 10. During this season, we blew by our 100th episode, covered another tool from the MITRE ATT&CK folks called Attack Flow, talked about the FinTech ecosystem and had a detailed discussion about two zero-trust tactics - privileged access management and crisis planning. We finished up with a mini four-episode series of forecasting cyber-risk. I'm exhausted. I need a nap. But don't you worry. We'll not be gone for too long. Season 11 starts in just over a month. We already have the army of CyberWire interns hard at work in the sanctum. In fact, it's getting pretty crowded down here. We may have to open a new wing. Hey, hey, hey. You three, get out of the snack cabinet, and get back to work. Sheesh, you have to watch them like a hawk. Sorry. Sorry about that.
Rick Howard: In the meantime, as always, if you agree or disagree with anything I have said for this episode, Season 10 or for the show in general, hit me up on LinkedIn or Twitter, and we can continue the conversation there. Or if you prefer email, drop a line to email@example.com. That's firstname.lastname@example.org. And if you have any questions you would like us to answer here at "CSO Perspectives," send a note to the same email address, and we will try to address them in the show. So see you in a bit. The CyberWire "CSO Perspectives" is edited by John Petrik and executive produced by Peter Kilpe. Our theme song is by Blue Dot Sessions, remixed by the insanely talented Elliott Peltzman, who also does the show's mixing, sound design and original score. And I am Rick Howard. Thanks for listening.