Threat Vector 1.16.25
Ep 50 | 1.16.25

Crisis in the Kitchen: Unraveling a Malware Incident

Transcript

Navid Asgharzadeh: Something to keep in mind is we're responding to an environment that we are totally blind in but we have responsibility for because it's under the BP umbrella. So now we're responding to something and we need to make it safe, but you're -- basically, you got blinders on and you need to go in and respond. [ Music ]

David Moulton: Welcome to Threat Vector, the Palo Alto Network's podcast where we discuss pressing cybersecurity threats and resilience and uncover insights into the latest industry trends. I'm your host, David Moulton, Director of Thought Leadership for Unit 42. In this episode, we'll dive into a malware incident that tested BP's cybersecurity team in unexpected ways. We're going to hear from the team members who are on the front lines of tackling the complex threat that emerged in their environment. I'm excited to introduce our guests for today's episode. Patrick Wright is a forensic and incident response lead who played a critical role in identifying and mitigating the malware threat. And Navid Asgharzadeh, manager of the CERT team that coordinated the detailed forensic analysis and response efforts. So without any further delay, let's dive into Crisis in the Kitchen: Unraveling a Malware Incident. [ Music ] I'm here with the incident response team at BP. The team has shared with me a few of their favorite stories and agreed to share today's story on Threat Vector. We will hear from the team about the incident and then get into a conversation about what the team learned, how and if they changed their approach to security and risk at BP. This story mentions an old but not forgotten foe, so stick around to hear how sometimes the nasty bits on the internet fade but never really die. As I mentioned in the intro, we're going to hear from Matt, Navid, and Patrick. And I'd like Patrick to kick us off by setting the stage. Patrick, BP is, at least in my mind, known for its gas stations. Can you let our audience in on why we're focused on the kitchen today?

Patrick Wright: Sure. So I think most people will associate BP with gas stations. Our retail sites are what people really know us for. That was very relevant in this case. And what actually happened to kind of set the stage is probably several years ago, our retail networks were kind of dispersed in different, you know, management spheres. In the last few years, we've tried to bring all that in kind of where they're globally managed. And during one of those cases a couple of years ago, our kind of global retail firewall ops team was trying to bring in all of the global firewalls. And one of the things they noticed as they were onboarding them is they noticed a trend of a lot of outbound suspicious SMB traffic from a single IP sitting on a retail network out in Southeast Asia retail station we had. So they raised it with our -- you know, cybersecurity team, said, "Hey, we see this kind of odd traffic. We just started onboarding this firewall. Not sure how long it's been going on, but, you know, at least for the last week or so that we have logs right now." It's been going on for at least that long. So they raised it to us as kind of just a suspicious activity to help investigate. And so they gave us a snippet of logs. I think I took maybe 30 seconds looking at them and knew immediately this is probably not a legitimate retail traffic. Give you an idea, it was a single IP just essentially spraying the entire internet with SMB NetBIOS-type traffic. And I think in about a 10-minute span, it was hitting 500,000 plus IPs on the internet just randomly all over the place, every country. So it didn't really mesh with what we expected and something that obviously our retail network guys weren't expecting this to happen either. One of the things about the site that we found out when the network guys helped push in touch with, you know, kind of local site crew or at least the local management for that region, and this ended up being -- like, the actual system that all this traffic is coming from is what's known as a Kitchen Management System. And for folks that have been to McDonald's or any drive-through at all, if you ever look in the back, you see these monitors up that kind of have these boxes with the orders and they kind of move your orders through these boxes through a screen. That's actually a Kitchen Management System. And I know this because I'd actually asked the crews locally, one of the general managers, "Could you, like, send me some pictures of what this thing looks like?" And it's a little computer that sits under a computer monitor, wall-mounted right up next to the drive-through. And this is what they use to move orders. This site particularly, they did a drive-through for coffee, for a coffee business there. And it's one of the busiest -- what I found out was one of the busiest retail stations for coffee specifically in this particular region. So it does a lot of business, a lot of money passing through, a lot of customers, and they get backed up a lot. If there's any small issue with the site, then it kind of backs up the drive-through. One of the other things that was kind of more disturbing about this was that not only do you have the Kitchen Management System, but it has to, at some point, interface with point of sale, right? And we found out that this system that's, like, spraying the internet with SMB directly connects to local POS where they're running credit cards and all sorts of stuff. So that kind of raised the antiforce immediately that we knew that we probably need to take some sort of action. Initially, what we had them do was just disconnect actually the network team. Since we were working with the network team, to begin with, we made kind of the ultimate decision since we weren't really able to get a lot of communication going with the local site IT support, the people that kind of support these retail stations, we made the decision at the time to have them disconnect this one system from the firewall port. So they just toggled the firewall port for this KMS system down, so it couldn't get out to the internet. What this actually ended up causing was, well, obviously, an impact to the local business. Because now they didn't have their KMS system to move orders through, so they had to fall back to manual where they're writing everything on paper, running orders that way, passing them back to the kitchen to, you know, get the coffee orders made. We were warned about that in advance from the network guys that were helping us, you know, track it down locally, that, "Hey, you can't bring this site down [inaudible 00:06:32]. We're going to have, you know, lines backed up around the corner." And we're like, "Well, we need to know what's going on here. And if you leave this up and running, you've told us that it's connected to a POS system, that's bad news until we know what's going on."

David Moulton: So you get this information and you start to analyze it and you see this SMB traffic spraying just like crazy. How do you go through prioritizing your actions in that moment? Was it the POS system that really took priority or is there a different process or set of actions that you'd look at?

Patrick Wright: Well, especially in an environment like this or any environment where we don't have visibility, our first questions are the businesses. What is the system? How does it -- what does it interface with? What does the local network look like? So that's typically our first line of questions, and that's more for us to assess. Do we have a critical situation here or is this something where we're okay; it's not as serious as we thought initially. As soon as the POS system came into the picture, we immediately, you know, had to take some actions. One of the good news -- one of the things that the insight network guys could tell us is that there were some limited communication between the POS system and the KMS system. So they don't necessarily talk over SMB. They don't have SMB open. So whatever it was that was doing this SMB spray was unlikely to impact the POS stuff. And it's kind of a one-way street as far as communication with the KMS system. All it does is pass the orders so the POS system knows, you know, what you're bringing the order up as. So it's not like the POS system is going to communicate back. There's no bi-directional communication where there could be admin activity or anything, where if there is malware on this KMS system, it kind of limits the blast radius, so to speak. So that's why at the end of the day, we want to pull the plug on the KMS and we felt okay with the POS system being kept up.

Navid Asgharzadeh: A point I wanted to make, David, is something that -- interesting that you guys will see with the IR work that we do in BP is that if you're noticing a theme here, a lot of the times, we don't have the visibility that we actually would like to have, or you often find in, I guess, smaller companies or companies that aren't as globally spread out as BP. Like Patrick just said, we're asking them, "Well, what is this touch? What does that system look like? Where does it sit? What does it do?" These are things that we typically should be able to reach out, do some checks on the network, etc., be able to see that, whether it's NetFlow, firewall logs, EDRs that we have on the systems. So we are responding -- something to keep in mind is we're responding to an environment we are totally blind in but we have responsibility for because it's under the BP umbrella. So now we're responding to something and we need to make it safe, but you're -- basically, you got blinders on and you need to go in and respond and do it appropriately. So it's pretty challenging.

David Moulton: So when you saw that the SMB traffic was just spraying like crazy, was that the big red flag that you ran into?

Patrick Wright: Yeah, absolutely. That was the initial red flag that led to, we have an incident here. It's just now we have to figure out how bad it is. So absolutely, yeah, that was the first red flag. And then the POS system being somehow attached to this network was the second red flag.

David Moulton: All right. So you got hands full of flags that are red and you've got -- I'm guessing this KMS system, this Kitchen Management System that you're talking about, that's not something that you all built. So now you've got to figure out what that technology is, maybe what vulnerabilities it has, what's causing it to behave in this unexpected way. Can you talk about that a little bit?

Patrick Wright: This Kitchen Management System is made up of some outdated -- like, they don't do regular patching on these sorts of systems. You can kind of think of it kind of like an ICS or an OT environment in some respects because, similar to those, you have this kind of sensitive hardware that requires certain versions of software. Well, in this case, I think it was a Windows 7 PC that this was running on. So something we're not unfamiliar with. But again, it's going to be outdated by this point in time. Microsoft's no longer supporting Windows 7. We're all, you know, talking about Windows 11 at this point, where we're going. So that was the first thing that kind of came up, is, yeah, it's Windows 7, which, yay, we can do Windows real easily. But then again, it kind of makes it scary that this thing is not being patched. And, you know, it has something on it that somehow got onto it that, you know, has infected it with malware.

David Moulton: You figure out you've got an older OS, maybe it hasn't been patched or it's definitely not been patched. And when you're faced with that, you talked about kind of going into a situation blind. What does the process look like to figure out what the actual problem is? What does the team do from an investigation standpoint?

Patrick Wright: I mean, one of the initial things, we don't really have a lot of visibility, except -- what we found out is the way this thing is managed. Similar to the last episode, it's managed remotely by a third party. The only good news is they do it through a secured VPN. So this is a secured VPN tunnel from a lab they operate out of. So it's not like it's just open to the internet. So that kind of rules that out as a, you know, entry vector for a threat. But, again, we have, you know, remote connectivity only. The only other people we have that would have access to the system is going to be the, you know, local managers and the folks in the coffee shop and the gas station, right? And they're not going to be the best to help us pull forensic data or tell us what's going on in the system. In fact, it doesn't even really have a monitor and keyboard set up where they can, you know, console in and check it out locally. So, again, we're kind of restricted to remote support to, you know, get what we need. [ Music ]

David Moulton: So you get this traffic lockdown, but you've got your partner that's allowing you to get in and get the evidence off of -- or run the investigation on this Kitchen Management System. What did you guys find?

Navid Asgharzadeh: Patrick, let me ask you this. What do we need first?

Patrick Wright: Absolutely, memory first. And then anything that's, like, in order of volatility and, you know, where we're going to get the most bang for our buck when it comes to remote collections.

Navid Asgharzadeh: So the interesting part and the point to that is, Dave, is that, as Patrick mentioned, how are you -- so now you need to figure out how are you going to get that memory down, making sure that, one, no one's rebooted the system, brought it down. How do we get it through this third party? Then how do we get it to us? And then how do we get the other triage data that we need from the disk itself?

Patrick Wright: So the big challenge here is, again, we have to open up some sort of line of communication, the third party that kind of manage this all for us. It consists of two guys that essentially, as part of this third party, they kind of manage the builds and networking and everything locally, remotely for this entire geographical area. It just so happens that this particular site was so custom, it was actually one of the larger sites and it had some customization because of the coffee business that it wasn't being up -- it was kind of blast to be slated to be upgraded. So here we are swinging in and going, "Hey, we need you to stop what you're doing and help us get some forensic evidence from these things." And now they're getting complaints from this local side of, "Hey, we can't get our coffee orders out. We're backed up." It's a big mess. But when we were able to kind of get them to help with us, again, we were giving them those tools to say, "Hey, we just need you to run this tool on the system to get memory, get us a triage artifacts, and get it across, you know, to your side back in the labs." And that process took about a week to get everything run and all the artifacts back to us.

David Moulton: So you've get the volatile information, you get the memory, you mentioned a couple of different things that you wanted to get right away in your logic. And that's when you start the investigation, maybe not with a hundred percent look at it, but with the most critical stuff for an investigation. What did you guys find?

Patrick Wright: Memory was kind of key to this entire thing because it didn't take us long to find the actual evidence of, you know, outbound network traffic, or at least the SMB, NetBIOS traffic in the memory sample. One of the problems with malware specifically, and it happened in this case as well, is because of the way it was showing up, there wasn't really a clear way to identify the source process. It showed, hey, here's all these 40-something IPs that it's -- at that time, whenever we took the memory sample, it came in outbound over all these different IPs over SMB. Problem was the parent process was listed as a negative one.

Navid Asgharzadeh: So we're looking to basically vector off of the memory to get onto the hard drive, right, to see what's actually -- what's that executable that's running on the drive itself that's causing this. Where is it hiding? And if it's running, it's going to be in memory. Patrick was looking at it, running it through volatility, comes up as negative one. It's an impossibility. So we decide, let's run strings.

Patrick Wright: Probably within about another hour, we were able to nail down that there were string matches in a very specific block of memory for a process. Fast forward a bit, once we knew the process, we were able to pull that out of memory and look at it. And sure enough, that's when we figured out, oh, this is not PET yet. That's kind of strange.

David Moulton: I can't imagine that's what you were expecting to find.

Navid Asgharzadeh: No, not for that long. That was definitely a shocker, which led us to another shocking conclusion.

Patrick Wright: All we wanted done is taking this information back to our poor, you know, two guys that have been trying to do this migration. We just finally got them to give us this, you know, forensic data. Now we're coming back to them with the punchline, so to speak, and tell them, "Hey, look, we have evidence that this is some, you know, some ransomware that has been on the system since," and we gave them the specific timestamp. We said, "How long has this actually been on site?" And it took them a while to get back to us. But once they got back to us, said, "Oh, that was actually when it was in our lab." So, yeah -- because we didn't deploy this to the site until like six months after that. So I'm like, okay, "Well, I hate to be a bearer of bad news, but you probably had this malware in your lab," and I guess the obvious next question is, how many other machines in that lab have been infected with this malware and shipped out? Luckily, it didn't look like we had any further impact, you know, at least for us.

David Moulton: Was the line of business there at the station that's selling coffee now by hand, were they pressuring you to put the system back online?

Patrick Wright: They were, but they were also being very careful because they were cognizant that there's a reason. One of the things that we have fostered with MVP, not just at a management level, is users and folks that are out, you know, doing the work and making the money, they're very cognizant of the importance of cybersecurity. So they know whenever we get involved, we're not just there to stop their business. That's not why we're involved. So they did work with us. They did mention, you know, that you could tell that there was a bit of stress on their part because they're the ones getting the phone calls from the site, but they also were really careful about pressuring us too much because they knew that we're there to do a job, that we're there to protect BP culture.

David Moulton: Yeah, great culture and an understanding that when you do show up and make that call, it's in the best interest of the company in spite of customers not having enough caffeine, you know, in this case. And so it sounds like, in this case, you had a ransomware, but it was stuck in the kitchen, couldn't get in -- couldn't get out. Have you ever had to contain something like that in another environment? Or is this sort of a unique one that has updated your best practices? Looking forward just in case.

Patrick Wright: We've been fortunate. We don't deal with ransomware incidents very often. We have had a few cases where it's happened again. It's always outside of our visibility. We've had them on places like an asset at a wind farm where a server that was internet-facing just, you know, got popped and they had actually multiple strains of ransomware on it. That's the most prominent one I can think of. We've had very few in the enterprise environment early on whenever ransomware was kind of kicking up probably about seven or eight years ago. We had a few cases here and there where our tooling would stop it. But luckily, we just -- we lucked out, right? We had the right tooling and the right response playbooks in place to prevent it from getting out of hand. [ Music ]

David Moulton: Guys, I want to shift to the question I always ask, which is, what's the most important thing that a listener should remember from the conversation? And, Patrick, you've taken us through most of this incident, so maybe I'll start with you.

Patrick Wright: I guess the most important thing to remember is -- and I kind of go back to something that one of our vice presidents always says, is that regardless of how good your tooling is and how good you think that your monitoring is, there's going to be a compromise out there. There is always something you need -- to be aware that you probably have something compromised that you just have to have the visibility to see it. You don't know what you can't see.

David Moulton: Navid, how about you? What's the most important lesson that you'd hope a listener would take away from this conversation?

Navid Asgharzadeh: The lesson you should take away from any security conversation is -- and I might get booed for this, but really is that security awareness at the individual level is very important. Regardless of the tooling that we have in place, millions upon millions of dollars in EDR firewalls, etc., it all comes down to the individual and making the right decisions. I'm a big believer in that. If let's say you're talking to a room of 10 people and you make just one person more security aware, not clicking on the wrong things, taking a second look at some activity that's occurring, you've reduced that attack surface by 10%, one out of 10, just by talking. Be more aware. Be more suspicious. Be a little bit more paranoid in your life. I know that sounds --

David Moulton: [inaudible] bit more paranoid.

Navid Asgharzadeh: Be a little bit more paranoid. Never hurt.

David Moulton: The big takeaway that I have is that you guys are inheriting a lot of systems, don't necessarily have everything perfectly mapped, and, you know, in your view, you don't have the visibility that you're looking for and yet are still able to succeed in responding and shutting down some of these incidents. And I think that's a testament to skill. I think that's a testament to having some of the right experiences and policies and playbooks in place and a little bit of luck. You know, sometimes luck favors the prepared. So I want to thank you for spending so much time with me today to talk about some of the incidents that you guys get into, sharing some of the insights and the learnings with our listeners, and having that candid conversation you talked about in the first episode, sharing intel. I actually think that sharing stories like these, lessons learned like these are important for our industry. So, Navid, Patrick, thank you very much and I appreciate you guys coming in and sharing with us today.

Patrick Wright: Thank you very much, David.

Navid Asgharzadeh: Thanks for having us. [ Music ]

David Moulton: That wraps up our deep dive into the Crisis in the Kitchen: Unraveling a Malware Incident. Patrick and Navid, thank you so much for sharing your experience and your insights today. Your quick thinking and teamwork and innovative solutions highlight the high stakes and critical nature of the work that you do at BP to protect them from not only evolving cyber threats but some that have never truly gone away. From identifying that initial threat to coordinating the response, it's clear that the decisions and expertise of the BP incident response team are crucial to safeguarding not only the company's operations but also the broader energy infrastructure on which we all depend. I don't think the stakes could be any higher. That's it for today. If you like what you've heard, please subscribe wherever you listen and leave us a review on Apple Podcast or Spotify. Those reviews and feedback really do help us understand what you want to hear about. I want to thank our executive producer, Michael Heller; our content and production teams, which include Kenne Miller, Joe Bettencourt, and Virginia Tran. Elliott Peltzman edits the show and mixes the audio. We'll be back next week. Until then, stay secure, stay vigilant. Goodbye for now. [ Music ]