Threat Vector 11.7.24
Ep 42 | 11.7.24

War Room Best Practices

Transcript

Michal Goldstein: A war room is usually not a fun day for all of us security people. A war room is when all hell breaks loose. There's a major threat, whether that's a vulnerability, a threat actor, anything that is really impacting at an industry level. [ Music ]

David Moulton: Welcome to "Threat Vector," the Palo Alto Networks podcast, where we discuss pressing cybersecurity threats and resilience, and uncover insights into the latest industry trends. I'm your host, David Moulton, Director of Thought Leadership for Unit 42. Today I'm thrilled to be joined by Kyle Wilhoit, Director of Threat Research for Unit 42, and Michal Goldstein, Director of Security Architecture and Research for Cortex. In this episode, we're exploring Kyle-Michal's session on mastering preparation and response tactics for significant security events, the evolving threat landscape, and the importance of having a strategic war room to enhance your organization's threat intelligence and response capabilities. We recorded this conversation several weeks ago and I'm now just able to share it with you. Kyle, Michal, thanks for being on "Threat Vector" today. I've been excited to talk to you about the research that you recently presented.

Kyle Wilhoit: Thank you, David. I'm really excited to be here. Appreciate the time and nice to chat with you.

Michal Goldstein: Thanks for having me. I'm happy to be here.

David Moulton: You both recently co-presented a talk called "Navigating the Threat Landscape, War Room Best Practices for the Next Major Threat." Can you give our listeners a snapshot of your talk?

Kyle Wilhoit: So our talk basically revolves around the concepts surrounding war rooms, emerging threats, how to effectively establish those war rooms, key considerations to take into account. And then we also dip into the threat landscape where we kind of mesh together, you know, key considerations from the threat landscape and how to kind of approach that, you know, from a war room perspective and a response perspective.

David Moulton: Michal, can you give me a definition of a war room? What is that?

Michal Goldstein: A war room is usually not a fun day for all of us security people. A war room is when all hell breaks loose. There's a major threat, whether that's a vulnerability, a threat actor, anything that is really impacting at an industry level. It could be a zero-dev vulnerability, it could be a nation-state threat actor with a new campaign that has been uncovered. And that requires almost all organizations or a significant amount of organizations to put everything aside and focus on investigating that threat and understanding whether or not they're compromised.

David Moulton: Kyle, when a new widespread security threat emerges, what are the critical first steps you would take to assess its scope and potential impact to an organization?

Kyle Wilhoit: Yeah, so from my perspective, David, which, you know, as a reminder is kind of looking at it from the researcher's perspective, there's several kind of key components that I oftentimes analyze to understand the scope, the breadth and the depth of the vulnerability that's being exploited. The first thing that I look at is, is there in the wild exploitation of that vulnerability. Meaning, are we seeing exploitation in the wild of the vulnerability across the internet? Are we seeing it on a customer basis? Where are we seeing that? The second piece that I look at is how easy it is the exploit to run. Is it something that is a remote code exploit? Is it easy to run from, you know, remote capability? You don't need to be on site. It doesn't require a lot of, you know, technical knowledge to be able to execute. I also look at the customer impact, meaning specifically, you know, are we having customers that are impacted? Can we take the actual exploit, you know, proof-of-concept code or something along those lines, analyze it, break it down, and then look across our customer base to understand do we have customers impacted? I look at the number of impacted servers or services that are sitting across our customer base, alongside analyzing the number of impacted servers or services that are vulnerable sitting on the internet where possible. So we oftentimes will go out and look in telemetry data to see, you know, is this vulnerability going to go and grow in terms of in the wild exploitation, what is the capability and what is the possibility of this threat growing, and knowing the number of impacted or the vulnerable services and servers living on the internet gives us kind of that capability to know how large could this problem grow effectively. So those are some of the main components that I typically try to assess when I'm going out and looking at that potential impact into the organization because that tells you relative speed, that tells you the capability of the exploit of getting into the environment, and that tells you, you know, what is the capability in the future? Is this going to become more widespread? So that's my perspective from kind of, you know, the researcher's angle.

David Moulton: Michal, let me take it over to you. How should an org approach response when the threat's severity is initially misjudged?

Michal Goldstein: So from an organizational perspective, even before understanding if something is misjudged or not, it is important to understand the impact of the vulnerability. So similarly to what Kyle explained, but more focusing on an organizational perspective, you need to understand if you're running any vulnerable systems or applications, right, so that's your first concern. And then whether or not this can be patched. And then the same parameters and aspects that Kyle talked about, the CVSS score, whether that's exploited in the wild and what's the impact across other organizations, and so forth. So even before understanding if you're misjudging or not, you need that initial understanding of potential impact. And then as you go through your investigation and answering what is probably the most important question on whether or not you're even compromised, then you can go into the investigation steps, which are also important, right? And that's where the misjudgment would also come in because if you don't understand the scope correctly, your actions would not be equivalent. And so some of the actions that you will take include patching if you can, and handling any compromised assets. And then from a misjudging perspective, if you have either not found the affected systems, you need to go through that investigation step again, finding those additional assets. So there's the initial assessment and there's additional research that takes place. You then find additional information, sometimes the severity was mis-assessed, right? So any type of misjudgment would then-- can be cross-corrected according to any changes and any updates to what is known about a specific vulnerability and you need to make sure that you take actions accordingly. [ Music ]

David Moulton: Kyle, let's shift into threat intelligence collection. What are the most effective methods and tools for real-time threat intelligence collection during a security crisis?

Kyle Wilhoit: Yeah, so I think realistically one of the primary things that can help during a security crisis is the same thing that could help a threat intelligence function that exists organically in organizations. And that's a centralized threat intelligence repository, something akin to what you would call historically as a tip. This basically allows you and allows a threat intelligence analyst to track and observe or track and collect observables and indicators and context in one specific system so it not only can serve as a collection mechanism for intelligence data, but it can also serve as an analysis engine to allow you to perform analysis and look into creating context on the data that you're collecting. This also allows you ultimately, you know, to say, you know, you have an incident responder that unearths a malicious executable that was dropped into your environment. Like having a central repository of that intelligence data allows the intelligence analysts and the incident responders to track that file, look for pivot points to say, "Hey, there might be a related activity to this." Cluster that related activity via tagging and other context methods ways-- and other context and methods and ways to do that. But I think that's kind of the first thing, using that central threat intelligence repository or a tip to track and store the indicator values. I think a second key point to that is leveraging automation as much as possible, right? Meaning when an incident responder or threat intelligence analyst is going out and analyzing a piece of infrastructure, there is a litany of different things that you can run on that indicator at like a base value, meaning triaging that threat. And that could be everything from getting past DNS results to looking at the history-- hosting history of that IP address or that infrastructure. But ultimately using, you know, that enrichment to automatically pull some of that analysis and enrichment for you is going to speed up the process that you have, you know, that you're performing analysis and incident response on. And then finally, I think utilizing subscriptions that have and/or convey context in some way, not just consuming IOC feeds. So you're looking to convey context, which is basically trying to get faster and more informed decision-making, especially during an incident or during, you know, an emerging threat situation. And that's something to consider is making sure that, you know, when you're purchasing feeds or when you're going and having considering those types of things, look for context that's applying, not just indicator values. So I think those are kind of three tools and things to consider, at least from my perspective, you know, surrounding a security crisis.

David Moulton: Michal, let me follow up with you. What are some of the best practices that you've seen organizations use when they're collecting data during one of these crises?

Michal Goldstein: So one of the main best practices, I would say, is keeping the same processes as you would not during wartime, right? So during wartime, you don't have time to create new processes, work through new tools, incorporate them in your environment. What's important is the preparation, and that to me is one of the most important best practices. So everything that you do ahead of time to then have that credible threat intelligence in your threat library, as Kyle mentioned earlier.

David Moulton: And would you put next to preparation practice, running your playbook, running your runbook, making sure that you understand what it's going to feel like when it is a live crisis rather than experiencing that for the first time?

Michal Goldstein: Exactly. So this ranges both to-- because Kyle kind of walked us through multiple elements here. So this would be one, right? So having your automations already pre-built, obviously not running them and developing them throughout the crisis, but also having those tools ahead of time, right? So obviously there are so many resources that you can use, so many feeds out there that would give you your threat intelligence. Even that on its own is something that you need to prepare and properly set up ahead of time. You want to make sure that you're getting credible information, right, so you have your validation processes in place, and then that the information that you are basing and using throughout your crisis and your war operation is actually operational.

David Moulton: Michal, what role does automation play in enhancing the speed and precision of your threat response efforts and maybe what type of automation tools are going to be the most beneficial?

Michal Goldstein: If you have your go-to playbooks, it is easy to utilize them in a war room situation. And by that I mean that you have fully tested and you have much confidence in those playbooks, serving exactly what you'd expect them to do. Those playbooks will save you a lot of time and therefore, incorporating them in your war room would be quite easy. You just drag and drop, right, if we talk about our sort of tool, you have the notion of sub-playbooks. These are usually those building block sub-playbooks. You do things like block indicators, isolate the endpoint, and so forth. Imagine those playbooks. These can be very easily incorporated in almost every war room playbook. Now, obviously, the logic for the broader kind of response to a specific threat might change based on the specifics of whatever incident, but those building blocks would be very useful in any case. Then in terms of the adjustment, if you have built your playbook in a more generic way to, for example, fetch all the indicators of compromise from a given blog, that is your infrastructure, right? So the blog that-- you know, where the researchers publish their research could change, right? So the URL would be different, every time it might be a different vendor, it might be Unit 42, it might be a different URL. You provide the URL, but you trust automation to extract, for example, all the indicators of compromise from it, and then pass that on to later automations that would investigate, use that those indicators of compromise for remediation purposes as well, and so forth.

David Moulton: Kyle, in your research, how has the threat landscape evolved over the last few years, particularly in terms of the sophistication and frequency of attacks?

Kyle Wilhoit: Yeah, so one thing is actually quite prominent, which is the sophistication in scams and the usage of generative AI specifically. There's really kind of two key areas that I've been honing in on in research recently, one of which is deepfake audio and video generation, and the other is "know your customer bypass techniques" using face-swapping technology. And I would say that these scams are increasing by a large magnitude. I don't have exact numbers behind that. But what I can say is I was personally-- my family was personally targeted with a deepfake audio scam just recently, where they had used a combination of different techniques, one of which involves calling my daughter, getting a voice snippet of her voice, and then playing that using a tool or a software suite to be able to basically clone her voice and make my wife believe that it was in fact my daughter speaking. So this is becoming extraordinarily prominent and I think from a financial perspective, it's only going to continue to increase and be more impactful. So I think the usage of generative AI just generally speaking from a cybercriminal's perspective, as well as from like a nation-state-backed attacker is only going to increase as this technology continues to become more aware and more abundant. I think additionally to that, from a more sophisticated attacker's perspective, I would say we've seen a pretty distinct sophistication and variants in command and control operations. So, you know, the idea of basically exploiting or using Discord or Telegram as command and control infrastructure, as examples for, you know, advanced malware families is something that we're seeing more prominent and more prevalent. And it makes it more difficult from an intelligence analyst's perspective to be able to go back and attribute who is behind that attack. It gives-- you know, it's a little bit harder for us to be able to determine attribution where we need to, and those types of things. So it muddies the waters in terms of analysis and attribution. And then I guess the third big trend that, you know, I would say broadly speaking that I've noticed again revolves around sophistication. And In this case, I would say it's the targeting and the really kind of wide exploitation of varying types of technologies. So, for instance, you know, we're seeing actors go out and exploit IoT devices and use those compromised IoT devices, command and control infrastructure. We're seeing that being done of VPN technology as an example, being heavily targeted over the past 12 months. So, you know, we're seeing a lot of exploitation of varying types of technologies ultimately really impacting, you know, a litany of different customers from across the board. So I'd say those are the three kind of real big trends that kind of jump out in terms of sophistication and the frequency of attacks in my mind from a researcher's perspective. [ Music ]

David Moulton: Michal, take us through the proactive measures that organizations should take to stay ahead of some of these emerging threats, and then how do you foresee those threats evolving in the near future?

Michal Goldstein: So threats will continue to evolve, right? Kyle covered this. We know that attackers are doing their part in getting better. We need to do that as well from an organizational perspective. And this goes back to my point on preparation. The more we put in place ahead of time, right? So if we cover our bases, right? And I do think that this is the most basic processes that we should do in terms of threat intelligence. We need to make sure that anything that we collect is then actionable and ready to use when needed throughout an investigation, whether that's a war room or just any incident that a SOC team would go through. So making sure that the intelligence that we collect is validated, that we can trust it, that the sources we collect from are validated as well, and then that we have that information whenever we need, that the context is available and rich enough, and that every incident that we go through has that information already built in for our analysts to save the time. That is also where automation would come into play and all of these are things that we can do from-- ahead of time and more proactively.

David Moulton: Kyle, can you outline the essential components and best practices for establishing and running an effective war room during a major security incident?

Kyle Wilhoit: Yeah, absolutely. So, you know, from my perspective, there's several different kind of key considerations that need to be considered out of the gate. The first is collaboration, meaning how are you going to send files to and from different teams across the organization? And that could be impacted organizations or impacted teams, or it could be non-impacted teams. But how are you going to collaborate? How are you going to send data back and forth to each other, etc.? I think communication also needs to be paid closely attention to. And what I mean is establishing in-band and out-of-band communications. So for instance, using Slack and Teams as in-band communication, but having backups and having the capability to discuss out-of-band communications via a third party, such as Signal, something along those lines. But realistically, you need backup communications because often I've been caught in particular scenarios where your primary form of communication goes down and you need a way to still collectively communicate about the incident that's going on. So having that piece is essential. And then the third piece, which I kind of already spoke about, was having that central repository for storing of those observables and those indicator values because you need a way to distribute that intelligence data out to the rest of the teams. So from my focus, I'm looking at consolidating data, normalizing that data by removing duplicate values, etc., and then sending that data out to key consumers or key stakeholders across the organization. I think in that same vein, there's several aspects that I also pay attention to, which is what are the detection methods for detecting this major security event or incident, how to prevent that activity across the environment. So that could be through, you know, a litany of different defensive technologies. That's also looking at products, meaning from my perspective, what is it that our products can do? And then a fourth kind of consideration too is having a structured PR outreach and legal framework in place to be able to leverage those teams because it's likely in the event of a large-scale security incident, you're going to need to involve a PR team and a legal team to make sure that everything is being abided by and that you're doing due diligence, etc. So, those are some of the essential components that I think about, you know, specifically whenever we start talking about war rooms and major security incidents.

David Moulton: Michal, how should organizations measure the effectiveness of the war room operations? And do you have a couple of KPIs that you think are the most telling?

Michal Goldstein: There are definitely some key KPIs. Before I get to the KPIs, I do believe we can talk about the essential components from an organizational perspective as well as that would tie in to the KPIs. So first and perhaps the most important and sometimes the hardest part is to get to and agreed-upon process for wartime. So especially once you start talking about kind of more complex processes for the organization, it is key to have that process in place, not just a PDF setting someplace, but rather a process that everyone vetted and agreed to. Then from a toolset perspective, Kyle covered that from a protection perspective, but for a customer that is important as well to make sure that they look at all the tools they have deployed in their network and they are using those tools as part of their war room process, right? So, all the coverage that they can get, right, from the network cloud endpoint, as well as the actual tools that they can then use for remediation and investigation processes. So having those predefined tools, as well as the predefined actions, which are even more important once you start talking about automating this workflow in a playbook, having those predefined kind of tools and go-to actions to then execute is also key. This also translates to getting access, right? So coverage both in terms of having visibility to all those tools within your doc, but also having access. So all the permissions that you need, all the API keys, getting all of that ahead of time. And then really making sure that you follow the full incident response lifecycle. So making sure you go through all steps of your investigation and response, as well as even recovery to make sure that you cover all bases. That leads me to another aspect that Kyle talked about, and that's people. Making sure that you have a list of all the people that need to be involved, right, all the stakeholders. What are the communication processes? How do you coordinate with different teams that might be using different tools? So having that predefined list would be helpful. And then lastly, there's also the need to have a repeatable process, right? So that would be easier within a playbook, but to my point on kind of the preparation, we talked about that earlier, threats are ongoing research. So you might start with an assumption on, you know, a certain situation, certain severity of the threat, as research continues, you know, you want to make sure that you correct that misjudgment if it happens, and then you cross-correct with your processes. So if, for example, you need to up-level the severity of an incident because this vulnerability is now actually exploited in the wild, you have your way to do so, as well as collect any additional indicators of compromise that are published throughout as research is ongoing for that specific threat. So from the list of components, I think that organizations have a lot of things to look into and make sure they incorporate those in their processes. Now in terms of KPIs, there are multiple KPIs to look at and to me, the most important one is how fast do you know if you're actually compromised. Right? That is the question that you want to have an answer for before your leadership comes to you and demands an answer. So the fastest you can get to that answer, the better, right? So that is the first KPI. The second KPI that everybody talks about is meantime to resolution, right? So as soon as you know that you are in fact compromised, how much time do you need in order to respond to that threat? And then lastly, there's a KPI that is perhaps the hardest to measure, but to me, it is also important, what is the level of chaos, right? So obviously if you have a lot of chaos, that would also-- you would also see that in the first two KPIs that I've mentioned, it will take longer to get answers to those. But from a chaos perspective, how exhausted is your team? How much friction do you have in order to get to those first two answers and the first two KPIs? It's also something to consider. [ Music ]

David Moulton: Michal, I want to go back to something that Kyle was talking about, which is the comms and having those different channels. Can you talk about how orgs should handle the challenges of coordinating all of those communications across maybe different geographies, different countries, different time zones with a dispersed team, and with maybe external partners?

Michal Goldstein: So there's no size-fits-all-- one-size-fits-all, right, that would depend on the specific organization in terms of tools, but there are multiple ways, right, to get everyone together. There are built-in tools. There are tools such as instant messaging to make sure that you're communicating. But what I think is most important is to make sure that you coordinate this ahead of time, right? So to my point on managing your people and communication plan, this is something you just need to agree on, right? So if everyone's using Slack, you agree that we all get into one channel. Everybody turns on their notification for that channel and that's what they follow, right? If someone expects a text message because this is a weekend, right, and that is also something to consider, especially when we talk about war room situations, how do you call someone on their weekend? And we all know that oftentimes it does happen on weekends, holidays, and so forth. So how do you also make sure that when people don't respond to the regular means of communication, right, if that's an email ticket, Slack message, how do you get them over text, WhatsApp, right, every country-- to your point, every country also has their own kind of preferred technology there. What's important, I don't think anyone would care if, you know, it's text or whatever. It's important to make sure that you just decide on that ahead of time and you set the expectation that this is the means of communication that we'll be using, and which is not less important, you have the right people, right? That had happened before, that someone, you know, was not invited to the channel, and then they don't know about it, right? So having that list and across the corresponding technology to use to communicate with that person and make sure that you're echoing across everybody that needs to be involved, that is key.

David Moulton: I know in a previous job, we ran into a test and nobody had the Webex key. We didn't have the password distributed. So everyone was in the lobby, nobody was in the room. And it felt like a simple thing to fix, but certainly a problem had that been a real-life fire event. Let's get into a post-incident analysis. And, Michal, I want to keep it with you. Can you describe some of the best practices for that post-incident analysis process in detail for me?

Michal Goldstein: So to me, lessons learned is perhaps one of the most critical components. Unfortunately, as we all know, it's kind of a known fact that oftentimes organizations invest in security after a major breach. And lessons learned is a component and a best practice that you should do in any investigation that you prefer. So from a best practice perspective, I think that most importantly, it is to look at what can we do better. And not just in terms of, you know, just one small thing, but rather from kind of three perspectives. One is on a process level. So what can be improved at the process? What changes can we do, especially when we talk about the playbook, right? So any logical tasks that might be slightly different or you need to adjust, that is a great opportunity to rethink that. From a people perspective is do you have the right people? And then the tools, right? So if you were missing any tools or if you had to go in and do some of the manual work to continue the investigation, how can you incorporate more automation to save that time next time as well? And there are kind of more specific examples, right? So when you think about the process, so what was the trigger, right? And especially when we talk about war room, how did you know that there is a major threat going on? And how can you better prepare for that next time, right? So to make sure that you're starting your war room as soon as possible from the initial publication of that threat. And then there's also the misjudgment. I think that's very important as well, right? If you found yourself in a situation where you didn't assess the threat correctly, how can you do that better next time? And then lastly, if you were missing anything, such as permissions, credentials, keys, right, anything, tools, if anything was missing, how do you ensure that you get access and you have that ready next time?

David Moulton: Kyle, can you talk to our audience about some of the things that you think security teams can do to best refine their threat response strategies and their analysis process?

Kyle Wilhoit: Yeah, so I think there's a few things. First, establishing a war room guideline and policy. And what I mean is, you know, there needs to be structure to the way that a war room, to the way that there is incident response conducted, the way that, you know, response strategies are conducted. So there needs to be a guideline and policy in place for that. Inside of that, considering having some sort of war room leader to help steering the communication of the team, to help task, to help steer communication where it needs to go, etc. I think establishing some of the clear and concise communication channels that I listed earlier between primary and secondary sources like Slack and Teams versus something like a Signal. I think redundancy in communications, as I mentioned before, is big. I think making sure that you implement immediate post-incident review otherwise, well, what you might consider a hot wash, meaning, you know, after the incident or after the event has been resolved, gather all stakeholders and responders for a quick debrief. And it's just a discussion meant to capture, you know, immediate reactions, thoughts, observations about the incident, etc. I think also identifying root cause analysis is big and that should be included in the policy as well. But realistically, I think really making sure that there is guidelines that are firmly established that are, you know, firmly outlining who a leader of this war room event or who can help coordinate this should be established. And it should all be documented. Because during an event, during a security incident, it's one of those things that, you know, you really want to have documentation in place so you can follow a prefabricated, you know, format of what to do. So I think that's a big one for my perspective. [ Music ]

David Moulton: Kyle, Michal, thank you so much for coming on "Threat Vector" today. I appreciate you sharing your insights and helping our listeners understand war room best practices.

Kyle Wilhoit: Yeah, thanks for having us, David. This was really nice, really enjoyed it, and I hope we get to speak more in the future, thank you.

Michal Goldstein: Thanks, David, this has been great.

David Moulton: And for our listeners out there, I will go ahead and put a link in our show notes to the talk that was given at Black Hat in August. That's it for today. If you like what you've heard, please subscribe wherever you listen and leave us a review on Apple Podcasts or Spotify. Your reviews and feedback really do help us understand what you want to hear about. I want to thank our executive producer, Michael Heller, our content and production teams, which include Kenne Miller, Joe Bettencourt, and Virginia Tran. Elliott Peltzman edits the show and mixes the audio. We'll be back next week. Until then, stay secure, stay vigilant. Goodbye for now. [ Music ]