Special Editions 11.10.24
Ep 79 | 11.10.24

Solution Spotlight: Rebuilding trust in the wake of tech calamities..

Transcript

Dave Bittner: Hello and thank you for joining us for this N2K CyberWire special edition. In today's Solution Spotlight, N2K's Simone Petrella interviews Alex Stamos, CISO at SentinelOne. They got together at the ISC2 Security Congress 2024 to discuss lessons learned in 2024 and what it could mean for 2025. [ Music ]

Simone Petrella: So we're here at the ISC2 Security Congress for 2024, and I know you are going to be chatting with the audience both in person and virtually here in a bit, but one of the things I wanted to start with was, you know, 2024, where the landscape kind of started, and what, in your opinion, were some of the more significant breaches or attacks of 2024 that are shaping the way that we think about the cybersecurity industry?

Alex Stamos: Yeah, we've had kind of a crazy year. So the keynote today is I'm pulling three incidents out. Not all kind of traditional breaches, you know, one of them is, but three incidents I think really shaped the cybersecurity landscape, and I'm pulling different lessons out. So those three things are I'm talking about the Cyber Safety Review Board's report of the Chinese intrusion into Microsoft and the follow-on Russian intrusion into Microsoft, but especially the lessons to learn of what happened with China and Microsoft, which actually happened last year, but the report came out this year and has a lot of lessons for us. The multiple security incidents that came out of the Snowflake multiple breaches -- not of Snowflake themselves, but other customers -- and then the massive CrowdStrike outage, which has had real massive repercussions for the security industry and for CISOs that deploy security products.

Simone Petrella: Well, I think it's a great segue because as a CISO, now on vendor side, but also having been within the corporate side as well, what are some of the things that you think you're taking away as a CISO when you think about those events?

Alex Stamos: Yeah. So I think there's an order. So the, you know, the Microsoft one -- I mean that, you know -- so the -- I recommend all CISOs to read, if you haven't yet, the Cyber Safety Review Board's report about Microsoft. Now, you know, the technical specifics are very specific to Microsoft, right? These are bugs that are specific to how did Microsoft build their authentication system for Office Online and how were their keys stored and stolen by the Chinese and then eventually used to read the email of people who work for the US government. It was eventually -- you know, this is not a breach that was discovered by Microsoft. It was discovered by folks who work for the government and then told Microsoft that it happened. But the lessons that everybody can learn, even though the bugs are specific to Microsoft, are a couple. One, half-finished security projects will kill you, right? If you look at, like, step-by-step of what happened inside of Microsoft, almost everything in there, Microsoft knew about, and they're working on it. They just weren't done yet, right? And, you know, one of the things I'm going to have do, you know, for a little audience participation in the keynote today is I'm going to have everybody raise their hand if they don't have partially finished projects on their risk register, right? And I expect nobody to raise their hand. You know, this is just true for any CISO, is we have things that we know are weaknesses that we've been working on and perhaps for years, right? It's -- sometimes it's easy to get to 80% done, 90% done. It's like a Windows progress bar. You get to 99%, but actually finishing, turning, you know, turning off that last server, getting rid of that last key is impossible because you have some dangling dependency.

Simone Petrella: Right.

Alex Stamos: And, you know, one of the lessons there is, like, the attackers don't care if you're 99% done. If that key works, if that server is up, they'll use it, right? So that's one of the lessons I think that's really important there, is like, sometimes you have to push through that last 1% because that residual risk is so big. I'm sure Microsoft wishes that they had pushed through whatever it is that last little bit that kept them from turning off that old 2016 encryption key would have saved a lot of pain for them.

Simone Petrella: It sounds like the adage, the kind of, "If everything's a priority, then nothing is a priority."

Alex Stamos: Yeah, exactly. Another lesson there is, like, we've built really flat homogenous networks, right? You know, cloud computing is great in a lot of ways. But what's happened is the -- you know, the biggest beneficiary has been Wall Street, right? Is that Wall Street has forced CIOs to kind of squeeze out all of the excess cost of running IT. And so now you have IT budgets at public companies. They've gotten rid of all the fat, and now you have a small number of people providing services to a huge number of internal customers. And the ratios of the number of system ends or DevOps engineers versus the number of containers or end systems is spectacular, thousands and thousands of machines per admin. And that's great until a bad guy gets their hand on one of those systems, right? And so, like, one of the things we'll be talking about in the keynote is, like, friction is not necessarily a bad thing, especially at the administrative level, is that we got to embrace friction a little bit more. You know, Microsoft in this situation built keys that worked across every single one of their customers. And so if they had built a little less of a modular system, they would have had natural firebreaks in there, and it would have cost them more. It would have been a little more difficult in some ways. But it also meant -- would have meant that it would not have been so easy for the Chinese government to penetrate their systems. And, again, they're specific to Microsoft, but you see the same pattern at every company, of, well, why not just make everything flat and easy because it's so much easier and simpler for us. And I think, like, that's a natural progression of where cloud has taken IT architectures. But the reality is, is we just got to -- we got to see that there is a natural benefit to friction, especially at the administrative level.

Simone Petrella: I know you also are going to talk about Snowflake, but do you think that that's a friction that we also should be embracing as a cybersecurity community and industry, too? Because your third example is CrowdStrike, and that's an example where it behooves a frictionless environment to have one primary, you know, provider, but when it's tied to something that it's so fundamental to what we actually rely on --

Alex Stamos: Yeah, I mean, that's a great example of -- you know, the fact that it is very likely for a company to have one EDR product means that if it breaks -- if it either fails because it misses something, it misses it everywhere, and if it breaks, it breaks all your systems at once. Now, you know, CrowdStrike in particular made specific, you know, they made specific architectural decisions that were extremely risky, and I think, you know, they certainly are not going to make the same mistake again, and I think most companies would not make that mistake. But you still could see failures from products where you can have -- you know, every EDR product is at some kind of conflict or something, and certainly they all miss things, right? And I do think that has raised up the question for people like, "Hey, should we, you know, maybe go 50/50 with security products?" Certainly, a number of companies have decided, "Great, our primary and our business continuity sites are going to run different security features." I know, like, one of those airlines that was involved. They had, like, an operation center that was this beautiful operation center that had, you know, rows and rows of computers where these professionals work very, very tirelessly to, you know, move airplanes around and move crews around and, like, deal with, "Oh, no, there's a hurricane coming, so we've got to reroute everything." And, you know, they work incredibly hard to do that, and they had CrowdStrike on all the machines, and then they have an identical operation center 30 miles away, and it has its own generators and its own power grid, but they were also running CrowdStrike. So it doesn't matter that everything was physically separate. Within seconds of this entire -- this entire building bluescreening, the second machine, you know, operation center bluescreened. They will not make that mistake again, right? That second operation center is going to have a different security products, different firewalls, different switches. Now, you can't get rid of Windows. Microsoft has a monopoly there, but what you can do is you can run on a different Azure tenant. You have a different Intune tenant. You can run N-1 patching for Windows. And so I think this is, again, where having non-homogenous networks of embracing friction, of having your primary and your BCP site be quite different from an IT perspective. It's a big pain. This is where system integrators might come into handy, where you end up paying a system integrator to run your BCP site for you, and to make it as different as possible is going to be worthwhile.

Dave Bittner: We'll be right back. [ Music ]

Simone Petrella: It's hard for me to think about this conversation and not think about the impact that this kind of has on the workforce for 2024, 2025. So ISE2 is -- you know, has released their workforce study. It's kind of the first year that things have stagnated from a cybersecurity professionals globally standpoint. It's actually been decreasing a bit in the United States. Are we getting to the point that a lot of these friction points should also kind of be a reminder to get back to the principles of we also need to spend enough time having redundancy in the humans that we actually have performing this work?

Alex Stamos: Right. Yeah. No, I think so. And I think the CrowdStrike outage proved that. I mean, that's one of the things all these people learned, was it's great to have one admin per 10,000 boxes until you have to reboot 10,000 machines. Like, you know, some of those airlines, you know, weeks later, I was still seeing blue screens in airports, and it's because they don't have the people to go out there with USB keys.

Simone Petrella: Yeah. We'll leave the burnout conversation for another day.

Alex Stamos: Yes. Right. But it's -- I mean, it is a legitimate issue of, like, when things go wrong, yes, one to 10,000 is a ratio that totally works when everything's working perfectly. And when it hits the fan, you know, not having that kind of slack space is a problem. And I do think we have cut too quickly. People have, I think, made assumptions around automation and hyper-automation and orchestration systems and such that aren't necessarily accurate. And I do see this all the time with companies that in -- you know, before I took the CISO role at SentinelOne, I was supervising the DFR team and working with companies from a consulting perspective. And I would deal with breaches all the time, where they did not just have the right security people. They didn't have the IT capacity to deal with a breach, right? It's like, oh, we've got to rebuild laptops. We don't have the people. We've got to rebuild our, you know, our Oracle database and our production systems. We don't have the people because we barely have enough people to keep things ticking over normally because we've cut to the bone. And so I do think -- and you will pay out the nose when you call PwC or Deloitte on a Friday evening at 6 pm to help you recover from a ransomware incident. They will charge you the maximum amount possible, and in the end, the CFO will not see that savings over the five-year period that they thought they would get from cutting all those IT folks. So I do think CIOs need to be thinking, looking big picture of what it's like when you go down to having 95% of the people necessary to run during normal operations, because over a five-year period, nobody just has normal operations. Something bad will happen every six months, and you need to have the slack space to be able to handle that.

Simone Petrella: Yeah. Hard to keep that long-term perspective in mind sometimes. So when you're trying to justify your budget in front of the CFO, who's like, "Well, it's been two years. Nothing's happened." So --

Alex Stamos: You're like, "Oh, you're going to penalize me for doing a good job?" Yes. Yeah. Yeah. Or, "I mean, I can make something happen," right? Like, it's definitely not the kind of -- yeah.

Simone Petrella: It'll be interesting to see what happens when you do that.

Alex Stamos: Yes. Exactly.

Simone Petrella: Okay. What changes do you anticipate in the cybersecurity field as we look towards 2025 as a result of some of the challenges we did face this year? Do you see anything changing as a result? Are we going to make headway on some of the barriers we've had?

Alex Stamos: Yeah. So, I mean, I think for security vendors like ourselves, there's a lot more questions being asked about how are we not blowing things up. So, you know, I -- one of the things I talk about in the keynote -- I actually throw up a screenshot from a still from "The Bridge on the River Kwai," which is a screenshot I actually use in class -- I teach at Stanford on Fridays -- and my students don't know what that picture is, right? So it's great. There's a lot of -- there's a lot of more -- there's -- I don't have the only gray hair in this audience.

Simone Petrella: Right. Yeah. Right. There's -- yeah. There's generational kind of commentary that we have.

Alex Stamos: So it's good. So there's people here who know what the movie is, right?

Simone Petrella: Yeah. People will know.

Alex Stamos: Yeah. And people who listen to the podcast know about that movie, so I don't have to explain that. You know, this is a picture of the bridge, and so it's like Sir Alec Guinness and his very sweaty khakis in front of the bridge and talk about, like, you know, CIOs built this beautiful bridge of architecture -- of IT architecture, so it's been incredibly reliable. And then security teams, our job is we rig this bridge of C4 and we blow up the bridge in case -- you know, the moment we see an enemy train coming over it, right? Like what we do as security teams is innately destructive. I mean, you just listen to the language we use, right? Like we block things. We isolate. We kill processes. We build systems that break the normal flow of IT to stop bad guys from doing things. And that's fine. I mean, that's what it's supposed to be. But I think post CrowdStrike, what's happened is CIOs have been like, "Wait a second. I build this beautiful, super redundant system and all these clouds and all these availability zones. And then I give SOC Analyst 2 this huge red button that says, "Destroy all enterprise value," right?

Simone Petrella: Yeah.

Alex Stamos: Why do I do that? And so I think one of the things that change is that security vendors and security teams themselves now have to justify to the CIO and the CEO and boards, why do we have this power? And I think that's actually a good thing. It's a good thing for vendors to say, "Okay, well, yes, we're actually much more careful than CrowdStrike in how we architect our kernel module. We're much more careful on how we test. We're much more careful how we deploy." That was always true, but now we have to document it. So that's good. We're documenting that better. We're proving that better to folks. But it's also then we have to build our product to help teams operationalize that better. So I think this is one of the things that you're going to start to see security products in '25 and '26 and going forward, is it's going to be a lot easier to build a product so that SOC Analyst 2 can do their job without having the "destroy enterprise value" button. Because traditionally, it's been you get onboarded in one of these products, and right next to do your job normally is the "kill everything" button. And it's not super easy to build things in a default secure. It's not super easy to build it so that there's two keys to launch the nuclear missile, right? And those are the kinds of things that companies have built, but it had to be extra. You had to build a bunch of frameworks to do that and such, and that should become the default. And I think it should become the default in IT in a lot of ways, not just on the security side.

Simone Petrella: Do you think that that's something that is a lesson that's also will be applied on the corporate side where they're evaluating vendors and actually having to make decisions?

Alex Stamos: Yes, I hope so. I hope -- like, what happens is corporate teams think about, "Okay, what is our workflow here? How are we going to -- " because, you know, like I said, EDR -- you know, security products break company. CrowdStrike is the only people that break the entire world, but security products break companies all the time. It's almost never the product's fault. It's almost always somebody inside the company uses the product to shoot the company in the foot, and then they blame the company. This is not -- no offense to any SentinelOne customers who are listening to this. I'm not talking about you. You're not the ones who I know blamed us because you did something. It's not you. I'm talking about somebody else, right?

Simone Petrella: Clearly not in consulting where I used to come from, where it was like, "No, actually, how do we tell you the problem is you?"

Alex Stamos: Oh, yes. Right. Exactly. Yes. Yes. But like, it's possible I have been on phone calls where I'm like, "You know, okay, you want to blame us. You're clearly paying us for our job, is to take the heat from you guys, but you're the guys who pushed the button that actually did this." And that happens all the time. And so I think, like, companies need to think through, okay, what is our normal flow here of a piece of malware comes down, we -- you know, it is communicating up to an IP address. We're going to decide that that IP address is malicious. How do we decide that that IP address really is the command and control server and that is not the corporate proxy server, or the corporate DNS server, which it habits, and that once you block that corporate proxy server, you cut off all the computers in the network from the corporate proxy server and you break the entire network, right? Those are kinds of process things that aren't appropriately thought out of. They have to be thought out of, and then products like ours and other security products need to support that and make that easy for that kind of flow to be supported in the company so that like somebody says, "I want to block this and then it goes to their manager," right? Or, you know, with AI now, it gets smart enough to be like, "It looks like you want to block the corporate proxy server." Your Clippy pops up and says -- I think Clippy itself is probably copyrighted, but we can have, like, our own Clippy, right? Like, you know --

Simone Petrella: I guess you're the first person in many, many years who have actually referred to Clippy in a positive way.

Alex Stamos: Yeah, well, like, your positive security Clippy, like, pops up and says, "It looks like you're trying to destroy the entire enterprise. You know, maybe I can help you by saying, like, 'Yeah, you shouldn't do that.'" And so I do think there are going to be some positive changes there. And I do think Gen AI has some real positive opportunities here to speed up defensive cycles. Right now, it's being used in positive ways to make queries faster, right? And so, like, for us, we call it purple, where you can -- instead of -- you could always ask, "Show me all the laptops that downloaded a new piece of software from a Russian IP address," right? You could always ask that, but you'd have to write this huge query with a bunch of quotation marks and you'd have to know exactly what you're doing. It'll take you 20 minutes, right? Now you just write that in English and you hit Enter and it does it for you. And that's great. But taking that data and doing something with it's a whole nother step. So we've gotten that first part down. And I think that that's the next phase, too, is then turning of, like, "Okay. Great. Now you give me that list, isolate all those computers," and being able to type in in English, "Isolate all the computers you just gave me a list for," and then making that implement in three or four minutes would be incredibly powerful. And that is something I'm excited about because that turns what used to be a multi-hour project during which -- during those multiple hours, bad guys were totally active going east-west. They know that they're in a fight with you. They're putting more back doors in place. They're creating more ways for them to maintain persistence. And so if you can turn that from a multi-hour process into a couple of minutes, then that gives defenders the advantage again.

Simone Petrella: Well, I think it's a great ending point to make because it's a little bit more of a boost to get us from those 88 to 89 to 99% completed projects, maybe, you know, more towards 100 so we don't have those often -- like awful risk registers. Alex, thank you so much for taking the time. I think it's going to be a fantastic talk and appreciate you sharing your knowledge with everyone here in the ISE2 community. Thank you so much, Alex.

Alex Stamos: Thank you. [ Music ]

Dave Bittner: That's N2K's Simone Petrella speaking with Alex Stamos from SentinelOne. We appreciate Alex taking the time to speak with us, and we appreciate you listening to our show. Thanks. [ Music ]