Security Unlocked 1.20.21
Ep 11 | 1.20.21

Under the Hood: Ensuring Firmware Integrity

Transcript

Nic Fillingham: Hello, and welcome to Security Unlocked, a new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft security, engineering, and operations teams. I'm Nic Fillingham.

Natalia Godyla: And I'm Natalia Godyla. In each episode, we'll discuss the latest stories from Microsoft Security, deep dive into the newest threat intel, research, and data science.

Nic Fillingham: And profile some of the fascinating people working on artificial intelligence in Microsoft Security. If you enjoy the podcast, have a request for a topic you'd like covered, or have some feedback on how we can make the podcast better-

Natalia Godyla: Please contact us at securityunlocked@microsoft.com or via Microsoft Security on Twitter. We'd love to hear from you.

Natalia Godyla: Welcome to the latest episode of Security Unlocked. Welcome back to civilization, Nic. I'm hearing that Seattle had a pretty bad windstorm. Glad you're okay.

Nic Fillingham: Thank you. Yes we did. We were out of power and internet for best part of two days. That was fun. But yes, we're back online. We have power. We have internet. We're back in the 21st century. How about you, Natalia? Any insane weather events up in the northeast? You guys get ice storms and cats and dogs and locusts falling from the sky, don't you?

Natalia Godyla: None this weekend though. I did almost freeze going camping and I had a close call with an attack over the weekend.

Nic Fillingham: Oh my gosh, that sounds crazy. What happened?

Natalia Godyla: I mean, it happened in Outward. I feel like I probably should have started with that. But Outward, the game.

Nic Fillingham: Oh, okay. Phew. I feel like you would have mentioned that to me in advance of recording this podcast had you actually been attacked in real life. What's this game? What's the game you're playing?

Natalia Godyla: It's an RPG game where you try to quest through this... Gosh, I don't remember a single name of any of the locations, the cities, or the mountains. I'm not paying attention. I'm really focused on the battles that you have to fight.

Nic Fillingham: What are you battling? Can you give something away or is it a spoiler? Is it humans? Is it animals? Is it zombies? Is it aliens?

Natalia Godyla: It's a mix. There are bandit camps and then there are troglodyte caves. I think I've taken on a whole lot of the troglodytes at this point though. So I don't know if they're still in existence.

Nic Fillingham: Let's take 30 seconds to look up Outward. You said troglodyte, and I really feel like troglodyte is an established word that means something. Oh, okay. So troglodyte is from the Greek, troglodytae, which literally means cave goers. Is that right? Do they live in caves?

Natalia Godyla: They do live in caves.

Nic Fillingham: Oh, there you go. Okay.

Natalia Godyla: This game must have done its research.

Nic Fillingham: They're cave goers, but they're also your enemies. Is that right?

Natalia Godyla: Yes, but I guess in theory, I brought it upon myself. I mean, I kind of wanted to loot the cave.

Nic Fillingham: So you actually went into their territory and were like, "I'm going to smash this jar and get this green jewel out of it." And they were like, "Hey."

Natalia Godyla: Yeah. I mean, that's a moral gray area because they saw me and immediately attacked but it was their cave.

Nic Fillingham: So you're the bad guy. Nice. All right. We're going to play this. We're going to play Outward. Wonder if we can get all of the Security Unlocked peeps into a single game. That'd be fun.

Natalia Godyla: Oh, yes. And I think with that, we can intro our guests. Yeah, there's no connection point here.

Nic Fillingham: Speaking of cave-

Natalia Godyla: Looting.

Nic Fillingham: Looting? No.

Natalia Godyla: How do you stop looting from happening?

Nic Fillingham: Oh, got it. I got it. If only those troglodytes had better security, people like Natalia Godyla wouldn't just come wandering in to ransack the place looking for leather and iron ore to craft rudimentary weapons. Speaking of better security, today on Security Unlocked, we talk with Nazmus Sakib who is going to spend a bit of time talking to us about firmware and the challenges associated with ensuring firmware integrity and the integrity of device security all up starting with firmware. This is going to be the first of three conversations that we'll have over a number of episodes where we better understand the security of devices from the firmware up. And then after that segment, Natalia, who do we speak with?

Natalia Godyla: After that, we speak with Bhavna Soman who is a senior security research lead at Microsoft. And she shares how she got into security, which was really a role that she played in her family. She was the de facto support for network and security issues like antivirus. And as she continued in that role, she got more and more curious and tried to understand what technicians were changing or why something might be affecting her computer. And that role and responsibility just made her that much more interested in the security space and eventually led her here to Microsoft where she works on understanding and getting insights from the data that we have in order to better inform our defender products. Onto the podcast.

Nic Fillingham: Onto the pod.

Nic Fillingham: Welcome to the Security Unlocked Podcast, Nazmus Sakib or Sakib as we'll call you in this podcast. Thank you so much for joining us.

Nazmus Sakib: Thanks, Nic. Thanks Natalia for having me. It's a pleasure to be on here.

Nic Fillingham: So two things. We love to start up these deep dives with an introduction. Would you mind explaining? I introduced you as Nazmus Sakib, which is your name. We're going to call you Sakib. Just anything you want to sort of say about that, but also, what do you do at Microsoft? Tell us about your role, the team you're in, the team's mission? What is your day-to-day like?

Nazmus Sakib: Yeah. I'm Nazmus Sakib. I go by Sakib. It's usually a sign on the team that you've met me where I get to clarify that growing up, everyone just called me by my last name. I'm originally from Bangladesh and Sakib is just more common as a first name in Bangladesh, which is what most people... My family ended up calling me. There's a famous cricketer by the name of Shakib Al Hasan who some listeners may be familiar with, but this is my first foray into fame.

Nic Fillingham: I am familiar with famous Bangladeshi cricketers. Thank you very much.

Nazmus Sakib: He's finally back after an unfortunate ban, but I think it's great to have him back on the team. Super excited for the prospects of the Tigers.

Nic Fillingham: Do you play cricket? We're going to do this. We're going to take the little party.

Nazmus Sakib: Yeah. Let's go down fully on that rabbit hole. So I played a lot when I was younger. I've been in America mostly since 2008, is when I first came for college. But prior to that, like most I think kids in Bangladesh, we play cricket. And usually, I grew up in Dhaka, which is the capital. So it was all improvised for the longest time. We had a little space on our roof. So it was like this flat essentially. And so it was probably about maybe 10 feet by 10 feet or not even. And so me and my cousins be a team of like two or three kids and we'd split it up. Someone would bat, someone would ball. You'd make up the rules in terms of how the runs would work. And same thing with if you find a little space in a back alley, or in any small sort of field or space that you'd get, you'd find a way to make it a cricket field. So good memories from back there. So it was kind of informal, but a lot of fun, especially now that the years have sort of gone on and I'm in a much different place where you just don't do that. It's pretty cool memories.

Nic Fillingham: Bring us back to your role here at Microsoft and sort of what you do. Can we think of a good cricketing segue? Is there any famous cricketers that have moved into the cybersecurity field? What's a hard left turn?

Nazmus Sakib: I think Satya is obviously-

Nic Fillingham: Oh, yes, Satya loves cricket. He's a big cricket fan.

Nazmus Sakib: Satya loves cricket, yeah. So I guess he's the most famous former cricketer turned tech luminary that I can think of.

Natalia Godyla: 10 points for the connection there.

Nazmus Sakib: So yes. It is a well worn path, cricket to Microsoft. And I'm just one more traveler on that road. But my day-to-day, I've been at Microsoft for a little over eight years now, actually right out of college. I work as a PM in one of the many security teams at Microsoft. My team currently is in the Azure Edge and platform team. Our team is responsible for the operating systems that we ship as part of Microsoft, and also that operating systems that our customers use on platforms like Azure. So our team has been responsible for building the security that goes into Windows for a long time. Been a part of that team since I started at Microsoft.

Nazmus Sakib: And then with the way to serve our customers on Azure, we want to meet them where they're at. And we have a lot of Linux customers on Azure as well. And so increasingly, our team is not just doing Windows work. We're also investing in Linux security technologies to help ensure that if you're a customer coming into Microsoft, if you're using Azure, whether it's on Windows or Linux, really bringing that platform, that operating systems' expertise to help secure whatever it is that you're you're trying to do.

Nic Fillingham: Awesome, thank you. I'm really excited for this conversation we're about to have. It's going to be one of sort of three. I won't call them introductory, but it's certainly a little trinity of conversations over the next few months where we're going to talk about firmware. We're going to talk about firmware integrity, the challenges of that, and how you go about ensuring and securing firmware integrity. We're going to follow that up in a future episode talking about the Microsoft Pluton announcement. I'm sure that'll come up at some point in our conversation today. You're joining us today, Sakib, to help us sort of come back to basics a little bit. Can you help orient us in this world of BIOS, UEFI firmware, all the various sort of synonyms for this stuff? We're going to talk about firmware. Let's talk about what is firmware. Let's talk about these acronyms. If you would, just sort of re-educate us so we can start the conversation.

Nazmus Sakib: Right. So the easy way to think about firmware is it's the first piece of code that runs on your hardware, right? So it's easy to sort of visualize that when you have a device, it's a desktop, or a PC, or a phone, any kind of computing device, you have the actual hardware, right? You've got the CPU, the motherboard, the power button that you use to turn the whole thing on, you have the hardware. The firmware is really essentially software that's typically baked in to the hardware. So it ships typically as part of the hardware. There's usually some read-only memory chip that's dedicated to storing that firmware just so that when a customer hits the power on button, the hardware knows how to turn everything on essentially. It's the firmware, that piece of software that actually goes and coordinates how devices are being made available to all the other things that run after the firmware, which is the operating system, and then the applications that you use on top of the OS.

Nazmus Sakib: So if you were to think about from the point that you turn on a device to the point where you're using an application, whether it's your browser, whether it's Teams or Zoom because it's COVID, usually a very simple workflow for that is you're turning on the hardware. The firmware is the first piece of software that runs on the hardware platform. It bootstraps the operating system. So it could be Windows, it could be Linux. And then after that, once you have the operating system running, you can run applications like your browser, Teams, Zoom on top of that operating system platform.

Nazmus Sakib: So the second part of your question, what is BIOS or UEFI? They're essentially flavors of firmware. BIOS has been around for the longest time, I think, in many ways with the history of the IBM PC. The BIOS was what you'd call essentially the firmware that ran on an IBM PC platform. A few years ago now, I think, essentially, the industry got together to revamp the firmware standards. So it's both a specification and an implementation of that specification. So UEFI, you can think about it as the modern BIOS, but because historically, people called firmware BIOS for the longest time, they're almost essentially synonyms. But typically, BIOS and UEFI both refer to the firmware that runs on any particular platform. And in general, they're perhaps used synonymously if we're speaking loosely. But most modern systems today use some implementation of the UEFI specification as the platform firmware.

Natalia Godyla: Can you provide some security context around firmware? What does the threat landscape look like for BIOS or the broader term firmware? What's been the history of attacks? What's more or less prevalent for firmwares compared to applications that are at risk?

Nazmus Sakib: Right, right. So much work has gone in to so many different parts of the technology stack, right? You think about the work that we've done at Microsoft and across the industry around things like antivirus solutions. You look at modern platforms like Microsoft ATP, Advanced Threat Protection, where you have just a view of the health of your operating system across many devices that's customized for your enterprise. All of those things, in many ways, have already made it harder and are increasingly making it harder for attackers to do things that they would have maybe gotten away with in the past for attacks in the operating system.

Nazmus Sakib: And so naturally, when you make one thing harder, you incentivize attackers to go elsewhere, right? And so what we saw as a trend and one of the places where this was really sort of evident to us in a way that felt it wasn't just us looking at it, it was also externally reported is if you look at the NIST which is the American standards body, essentially, the National Institutes of Standards and Technologies, I think, I'll have to go verify that, but they actually maintain the National Vulnerability Database. So if you think about vulnerabilities that get reported, you see in the news and they often have some numbers associated with it. That's actually all the numbers in the National Vulnerability Database.

Nazmus Sakib: And so one of the things that you saw in the research that's being done in the industry, this is where all the security researchers report issues. It's like the aggregate. This is how the industry keeps track of all the vulnerabilities that are happening across all technologies. There was a large spike in firmware. If you just go to the NIST website and you go type into firmware, it went from a handful of firmware vulnerabilities being reported in, I think, 2016/2017 to hundreds being reported in the last year or two. And so a huge spike beyond exponential. And that really is because we're making it harder to do the things that perhaps attackers would be able to do in the past and the operating system. And so people are naturally moving elsewhere. And so they're gravitating towards firmware as an avenue. So that's one reason.

Nazmus Sakib: The other reason is coming back to what I was talking about in terms of how a platform boots. Firmware, because it's the first thing that runs on your hardware, because it needs to, just by its very nature, set up your hardware in the right configurations, it actually bootstraps a lot of the security on your system. Right? And so it's almost like a double whammy. Attackers are moving to a place where a lot of the problems that have been solved in the operating system from a security perspective, they're trying to work around those protections. And then in firmware, they actually see that you have this highly privileged environment firmware typically has almost usually when it starts up, almost unrestricted access to all the hardware and the data that's on your hardware. And so that's really where we're seeing this trend where attackers are... the security researchers suggesting that attackers are going to be moving there.

Nazmus Sakib: And one very recent practical example of a threat where these trends are bearing out is just, I think, last week, there was a report that TrickBot which is almost like a modular malware that's being used in a lot of other ransomware attacks, it's actually added firmware capabilities. So it's using other longstanding well-known vulnerabilities in the operating system, but because of the trends I've just described, we're seeing TrickBot add new firmware attack capabilities as well.

Nic Fillingham: Sakib, do we know when firmware attacks begin? Is there a defining moment in time when firmware became an actual viable target? Or has it sort of always been there and it's just recently evolved?

Nazmus Sakib: It's always been there. I mean, firmware is always run with high privileges in a way that it may be difficult for operating system software, including security tools, to tell what's going on in firmware. It's easy for firmware malware to hide what it's doing. But if I were to think of a tipping point, if you will, a couple years ago, we saw that at least one example of what's typically associated with a particular nation state threat actor. There were targeted attacks a couple years ago that were using a firmware vulnerability. So in some ways, that was a very clear signal that not only is the security research headed that way, but there's at least that first example. It's almost like the canary in the coal mine, if you will, where we saw an example of an attack that tried to do exactly what I described, is use for a very targeted attack, use firmware to circumvent a lot of the security tools, and find a way to persist.

Nazmus Sakib: And with developments like what I talked about for TrickBot, which is generally often used by many different actors trying to orchestrate different ransomware attacks like Ryuk and Conti, we expect to see that trend sort of increase. And so if I were to think about that first tipping point where attacks start to become real, the LoJax attack is, I think, what it's typically referred to as maybe the one I can think of where it really sort of became not just a trend we're seeing in the research, but a really practical attack.

Nazmus Sakib: By its very nature, firmware is complex. There's tens of thousands or millions of lines of code running if you think about all the firmware that runs on your system. So if you just think about the basic security principle of trying to reduce your attack surface, trying to have lease privileges, what you really want to be able to get to is that your trust is not necessarily fully dependent on all the firmware being written totally correctly and totally secure and not vulnerable to an attack. Ideally, you want to not trust that huge infrastructure. You want to be able to go do that trust of fewer set of things. And that's sort of the journey that we've been on recently with our OEM partners as well with secured-core PCs is to do that evolution. A UEFI secure boot doesn't go away. It's still an important technology. But we want to be able to start layering on additional capabilities that can start to protect important security properties or security capabilities even from firmware compromise as that's really where the trends are going from an attacker perspective.

Natalia Godyla: So your team has done a lot of great work around secured-core PCs. What would it take for an attacker to actually break into one? Is it possible? What do they have to overcome?

Nic Fillingham: Without obviously giving away some operational security here, but just like in Bizarro fictional land with infinite compute power and physical access to the device, what are the monumental challenges that would need to be overcome?

Nazmus Sakib: There are a couple places that I think are interesting that we're definitely thinking about. Security is not a static thing. It's always dynamic. We do something and then so do attackers. And so if you think about... It comes back to maybe the foundation analogy. We are building a lot of our security promises on things like the TPM. We want to be able to securely record the firmware that's running so that we can actually tell that it's the firmware that we expected. Right? So that's an area that we're thinking hard about and it's part of the motivation for Pluton. I'll leave it up to you all to interrogate Peter around what the effects are, but I think that's one place where a lot of our security promise is built around that.

Nazmus Sakib: We spend a lot of time thinking about TPM attacks. And it's a big part of the motivation for why we're adding another choice to the Windows ecosystem around using Pluton, is just being able to continue to raise that bar against attackers. So I'll leave it to you, Nic and Natalia, to interrogate Peter as to how Pluton will help with the security of future Windows systems.

Nic Fillingham: We'll absolutely do that. So Sakib, thank you so much for your time. As always, we will have some notes. We'll have some links in the follow-up show notes. And I'm not sure we've actually offered this to listeners before, but if you do have questions about securing firmware, anything that Sakib talked about, contact us on the Twitters. You can send us an email, securityunlocked@microsoft.com, and we'll do our best to point you in the right direction. Thank you much, Sakib.

Nazmus Sakib: Yeah, no. Definitely thank you for having me on here. It was just a great conversation. I enjoyed it. And I second what you just said. We'd love to hear from listeners around things that we can do a better job of communicating or feedback folks have on how well we're doing in terms of meeting their needs.

Nic Fillingham: Sakib, thanks so much for your time, mate.

Natalia Godyla: And now, let's meet an expert from the Microsoft security team to learn more about the diverse backgrounds and experiences of the humans creating AI and tech at Microsoft. Today, we have Bhavna Soman on the episode. Thank you for joining us.

Bhavna Soman: Thanks for having me, Natalia and Nic. I'm very excited to be here right now.

Natalia Godyla: We're excited to have you. So love for our audience to get to know you a little bit more. What is your role at Microsoft? What does your day-to-day look like?

Bhavna Soman: Yeah, absolutely. So my official title is senior security research lead. But like it often happens in big organizations, it kind of doesn't accurately reflect what I do. I lead a team of security researchers and data scientists who use machine learning and AI to fight threats on the Microsoft Defender platform. And that kind of reflects my own background as well, which has been checkered with experience in security research and machine learning. So to me, that's a very good fit even though I can't get them to include all of it in my title.

Nic Fillingham: Bhavna, we've spoken to a few of your colleagues on the podcast already: Holly Stewart, Jeff McDonald recently, Karen Lavi. How would you describe what you do? What is different about your role and your team compared to maybe Jeff's team or Karen's team, et cetera, et cetera?

Bhavna Soman: Yeah, absolutely. So the focus for my team is on using AI and ML on building intelligence and context for our enterprise customers. So when you look at how you want to apply machine learning in data science, I think it all really boils down to how can you reduce the dependency on human beings who have the security expertise? How can you bring in AI to help enterprise customers better defend themselves in this field that has a scarcity of talent, to be honest? And so what they do is look for clean or malware files. Whereas my team is focused on providing, for example, information about emerging campaigns or information about, what are the attacks that are linked to each other and form one incident so that an organization can address them together as a whole and therefore get efficiencies from that analyst as well?

Bhavna Soman: So these are just a couple of examples of what I mean when I say like we provide the intelligence. So I think someone put it very succinctly a few weeks ago where Jeff's team finds the badness, Karen's team finds the goodness, and I kind of bring it all together and give it meaning.

Natalia Godyla: That's awesome. I love that definition. Nailed it. And stepping back for a moment, I'd love to hear about what brought you to Microsoft and what brought you to security research. As you mentioned, you had a journey that included machine learning and security research. So how did both of those come into your career path?

Bhavna Soman: So I was always excited by security. And even from a very young age when we had our first laptop, which was like way, way back. I think it either had Windows 95 or 98. So it was really old. And those days, you get infected by stuff all the time. So for my family, it used to be my job to kind of figure out exactly where was the registry key in which this thing had saved its autorun tactic or persistence tactic. And at that time, I didn't know what any of these were called or anything. But that's how I first got into it. And then I decided that I really loved this sort of adversarial aspect of security. It really brings an excitement to the whole thing for me.

Bhavna Soman: My path did not take me directly to security still. My undergraduate studies were in mechanical engineering. So thankfully, I got a fair bit of math and also programming classes in, but I was chasing different things at that time. But after a while of working in that space, I was actually doing pipeline design for this company that constructs oil refineries, which was a very soul-sucking job for me. Yeah. I didn't like it at all. I did that for two years after college, and it just was not for me. So I was like, "Okay, I really love computers. I have to go in that direction." So I started to build software tools for that company. And then that gave me sort of this way to dip my toes in. And then I realized that, okay, this is definitely something I love doing. So I decided to go for masters.

Bhavna Soman: And then when I was choosing my area of focus for my masters, I was like, "Yes, security has to be it." So I went to Georgia Tech to do my masters and I specialized in security. So that gave me a great sort of grounding and all of the basic skills, a great background at the industry. And Atlanta has a very good infosec community too. So I had the chance to get plugged into that. Yeah. I really loved going there. And after my education there, I worked for this startup out of Georgia Tech, which incidentally specialized in using machine learning for network security. So that's where I think I got introduced to, hey, machine learning and artificial intelligence can have something to say about this.

Bhavna Soman: The more I stayed in the security industry, this problem of how it's all a whack-a-mole where a few people are chasing thousands and millions of different variants of the same attack. It really impressed on me that this is not something I can do manually. I can reverse 10, 15 samples. I can't do a thousand. So that's where the power of AI and machine learning really struck me. So I think that's where I started going deeper and deeper into that.

Nic Fillingham: I wanted to come back to something that you touched on about being the family... What did you say? When a virus came on the computer, you would be the one that would be in charge of getting it off? Is that correct?

Bhavna Soman: Yeah. Yeah. So at that time, I think they weren't super severe viruses. They weren't doing human operated ransomware stuff. For instance, they'd show you annoying pop-ups or they would change your search engine all the time. And they were doing very annoying things like that. I took on the task of investigating, how exactly is this thing coming back even though I deleted it? And then I started to discover the hidden mode in Windows and I started to discover all of these registry keys and rededit. It kind of went deeper and deeper and deeper from there.

Nic Fillingham: Got it. Were these in the days where you could just install as many toolbars as you wanted inside your browser to the point where you could no longer see a web page? Are we going back that far?

Bhavna Soman: Yeah, yeah. It was one of those days where... And also, Google was not really a thing. I remember Yahoo chat rooms used to be the big thing.

Nic Fillingham: AltaVista, baby. AltaVista.

Bhavna Soman: So fun times. There was a simpler world for sure.

Nic Fillingham: Bhavna, how long have you been at Microsoft now?

Bhavna Soman: It's been three and a half years now.

Nic Fillingham: Got it. And and the first role that you came into at Microsoft, was that in the team that you're in or was that in a different group?

Bhavna Soman: It was still with Microsoft Defender, but I was doing slightly different stuff. I was focused more on just pure security research and not as much on the machine learning and AI aspect.

Nic Fillingham: Three and a half years ago, what were you focused on? And how has that sort of potentially evolved? How has that changed today? Were you still focused on the same types of attacks? They've just sort of evolved in sophistication. Or was it a completely different world three and a half years ago?

Bhavna Soman: So when I first came to Microsoft, I was coming fresh off of Intel. At Intel, my focus had been on threat intelligence. Again, this was back when threat intelligence was just starting to become a thing. So I joined Intel before that. And at that time, they needed a threat intelligence platform where you can gather all of the TI information from all these feeds: internal, external, et cetera. So I built that first platform, plugging it into all the internal/external data feeds, organizing the data, and then having that pumped into the various prevention and detection systems. So that's what I was doing primarily at Intel. So when I came here at first, I was still in that mindset, and I was still trying to apply intelligence to improve protection. So I was doing a lot of hunting on VirusTotal, kind of try to find out where our biggest gaps were, and trying to plug those.

Bhavna Soman: But very quickly, that pivoted to using machine learning for security was focused on non-PE files. So very heavily focused on the document files that we very often see come in as email attachments, and then they will lead the user to download something actually bad like, again, an Emotet or Dridex or something. So it was very focused on those macro files and other non-PE files. JavaScript was a big one at that time. So writing classifiers to differentiate between malicious JavaScript and the benign kind. Those were some of my first projects here.

Natalia Godyla: So you said a couple of times that the draw of machine learning for you is the potential for scale, the potential for helping to fill that skills gap. So as you're shifting into roles where machine learning is playing a bigger and bigger part, what are the achievements that you're focused on? What would you like to try to automate better so that humans can shift to other tasks?

Bhavna Soman: So there is one problem, which is very close to my heart. And that is the problem of the core threat intelligence business. So Microsoft Defender has a really big threat intelligence team. And this was something... I was part of the threat intelligence team at Intel as well. And all through my time working with these teams, it's been obvious that threat intelligence is very manually driven right now, right? It has to be a human that is reading files or PDFs or white papers. And then this human is, again, observing traffic data whether by hunting or through the attacks that they are remediating or something like that. So this human is then kind of assimilating all of these insights that they have about these attackers. And then they put it out somewhere. Like maybe they will communicate it to their customers saying, "Hey, this is what you need to be careful about." They may write a white paper or they may do detections as a result of that. So this is a very human thing.

Bhavna Soman: And when I look at artificial intelligence and machine learning, to me, using large amounts of data to extract a few critical insights, to me, this is a very good use case for machine learning and AI. So this is a problem that I have been working on for a really long time. My first attempt at this was while I was at Intel, and I did this kind of cross-team project with a team that was in Argentina at that time to work on a method that could use question answering techniques from machine learning to answer questions about attackers. So if I had a question about, "Okay, what is the tool that this attacker uses? Or what is the victim vertical for this attacker?" Can I use question answering techniques and train on the corpus of data available about these attackers and have an AI-based system give an answer?

Bhavna Soman: So I've been attacking this problem for many years. My first attempt while I was at Intel was not very successful. But a couple of years ago, I gave it another shot. And this research ended up being... I presented this at Black Hat last year where I was talking about how we can use some new techniques that had come out since then around word embeddings, natural language processing, and domain specific named entity extraction to do similar stuff. So I think I've been making progress on that problem. And now I'm working on a project with University of California, Berkeley on this security AI RFP where now they're expanding some of this work into the security knowledge graph where their aspiration is even bigger. Yes, we grab all of this data from a variety of different data sources. Yes, we do named entity extraction. But what else can we do on top of that? Can we automatically build, for example, YARA signatures based on this? Can we use multiple data sources to achieve consistency internally within this graph?

Bhavna Soman: So that's where we're seeing AI and machine learning will take threat intelligence and help it become a little bit less manual, and again, less dependent on manual expertise?

Natalia Godyla: What challenges are you facing with achieving some of the goals you've outlined? I'm assuming compute is always something that's in the back of your mind. What else would be a barrier to potentially achieving some of these successes? Or what are you tackling right now to reach your goals?

Bhavna Soman: That's a great question. Compute is a big one because on one hand, we have large amounts of data. But on the other hand, A, to process all of that in a deep learning style would take huge amounts of compute that would make our product run very inefficiently on our clients and in organizations' machines. So usually, that's not feasible, which is why one of our big focuses is to find efficiency in whatever techniques we're using so that the model can be lightweight and yet perform with similar degrees of precision and recall.

Bhavna Soman: Another big challenge we face is good labels or ground truth. Just because the spectrum of badness is so huge, on one end, you have these just adware things are grayware things that their whole goal might be to show advertisements or cause pop ups. And on the other end, you have APT threats. So in this wide spectrum, we have to find good labels for a large enough set for each particular category so that we can accurately classify threats and inform users about that. That's been a very interesting problem too. Going back to the threat intelligence space, one really huge challenge is that the field is continuously evolving. A particular thing might be used for human operated ransomware on day one, but on day 30, it's hosting some random adware or some software bundle or something. So within that span, even in shorter spans, the situation really changes. The intelligence you have really changes. So all of your machine learning systems have to be able to constantly getting the latest information adapting to that. So those are some of the big challenges we face in this field that we're trying to work around.

Nic Fillingham: Bhavna, one of the questions we like to ask on the podcast is, what from your personal life, whether it's a hobby, whether it's something growing up as a kid, whether it's education or previous job, do you bring forward into your current job that could be considered maybe unorthodox? You teased very early on that maybe you play D&D. Is that true?

Bhavna Soman: Yeah. I play video games or board games. I'm into all of that.

Nic Fillingham: Is that a passion for you? Do you find yourself bringing any game theory or the way that you would approach a D&D encounter into your day job?

Bhavna Soman: I think my biggest influence is books and language. I have been into books as far as I can remember. That was my favorite birthday gift when I was a kid. I just dragged my parents to the bookshop and buy a bunch of stuff. And a peculiar way in which humans use language and give meaning to it, to me, that is a source of endless fascination. Which is why one of the favorite authors for me is Patrick Rothfuss and his book, Name of the Wind. I think that book really talks about... It's a fantasy book. So it kind of goes into like if you know the name of a thing, then you have some control over it. It's a philosophical point, but also it says something about language. And in my mind somehow, all of that comes together and that really leads me into, how do machines interpret language? What does it mean for a machine to understand language? And when we're building all these natural language processing models, what exactly are we doing? And then what exactly are we missing from what human communication actually entails?

Bhavna Soman: Which is why I'm kind of always drawn into this threat intelligence field because I'm like, "This is really where the importance of language and communication becomes connected to security." So that's kind of this one thing for me that I really, really love. In fact, one of the really cute examples that's always stuck with me is when you do a beginner course on natural language processing, you always kind of get this example. It's called crash blossoms. There was apparently a headline in the newspaper a long time ago where the headline said, "Violinist in Japan Airlines Crash Blossoms." And obviously, the headline meant to say that this violinist who was involved in this air crash a while back is now doing well. But when an NLP based system is trying to process it, it is like, "What is crash blossoms?" And I love that problem because it kind of emphasizes very clearly how machines are different from human beings, and yet how we're trying to bring the two closer for our own benefit.

Natalia Godyla: I feel like one of the other unique points about language is just the evolution of slang. So I'll be curious to see how NLP processes and consumes slang because that is such a cultural moment. It depends on the cohorts of people that you surround yourself with, the social context.

Bhavna Soman: Yeah, that's a great point. You talked about slang specifically where a meaning of a particular word or phrase can be different based on even the environment or the forum in which it is used. Certain terms, if you use it in an industry specific way, will mean very different than in the general sense. And we come across that in security so much, right? We have all these actor names like Scary Panda or Crawling Spider. And if you think of using like a traditional NLP model and all of this data, you're like, "This is not going to make sense because you're talking about a specific entity, an actor, not an animal." So we do have those kind of challenges in our domain. And I love diving deep into that.

Nic Fillingham: So I have another sort of random question. I was possibly laying the ground for this with my previous question about, what from your hobbies do you sort of bring forward into your work? Your avatar, your photo in the Microsoft GAL in our sort of identity system is Megamind. Is that right?

Bhavna Soman: That is absolutely right. I think that really ties into my sort of chaotic neutral rogue character because Megamind is a really good example of that, right? Supposed to be a villain but is a hero, but also is a villain in some ways still. This was actually a prank. We had Microsoft Month of Give last month. So your teammates could donate some money and force you to change your profile picture. So that's what I got.

Nic Fillingham: Did you choose Megamind or Megamind was thrust upon you?

Bhavna Soman: I chose Megamind. I was like, "Okay, this is the most appropriate for me."

Nic Fillingham: Oh, so you do resonate with the Megamind character on some level?

Bhavna Soman: I do. Yeah. I think so. And also, it's a really good movie that kind of has not had its time in the limelight for a while.

Nic Fillingham: I don't know if I've seen it. I think my kids have seen it. That's sort of why I know it because I think I've sort of had to approve them watching the movie, but I don't think I've seen it. It's good, is it?

Bhavna Soman: It is amazing. You should definitely watch it. It's a very cute movie.

Natalia Godyla: I think we have our homework, Nic. I haven't seen it either.

Nic Fillingham: Bhavna, before we let you go, is there anything you would like to plug? Any sort of organizations you're a part of? Any communities, groups? Anything you'd like to say out there to aspiring students of machine learning who either want to get into the field or just want to get better at machine learning?

Bhavna Soman: I would love to. So the organization that I want to talk about is not associated with machine learning only. It's associated with security all up. So I am part of a group of women called BlackHoodies. And we are committed to increasing the participation of women in hard technical areas, which sometimes don't see as much participation from minorities. We are across the globe across many companies group. The only I think criteria is you are a woman, whatever your definition of that is, and it's always free. We hold classes at multiple conferences across the world which we'll do things like reverse engineering, Windows, ARM, web hacking tools like Ghidra, all of that. We have all these trainings that are completely free. And now that we are in the pandemic, we're doing some of these remotely. So please follow us on Twitter. And if you're interested in joining one of these trainings, it's super easy. And we really, really welcome anyone who wants to learn about this stuff.

Nic Fillingham: As you were talking, I searched Black Hoodie on Bing and just got a thousand results for buying a black hoodie. What is the URL for the community group? I think I may have just accidentally purchased a black hoodie. I've got Amazon, what is it, one click buy. I went a little too quick. I was trying to pay attention to the recording window for the podcast and then searching for what this was. Anyway.

Bhavna Soman: I hope it fits. So the website is blackhoodie.re. And we talk about all of the latest events or workshops that are happening there. Usually, when Microsoft holds Blue Hat, we'll do a bunch of trainings at Blue Hat as well. I do the beginners reverse engineering for x86 as part of that. But right now, we don't have in-person conferences, but we're doing virtual stuff.

Natalia Godyla: That's great, Bhavna. I think one of our previous guests has also shared BlackHoodies. So thank you for highlighting it. It sounds like a great organization. And to our audience, please check it out. Thank you, Bhavna, for being on the show with us today.

Bhavna Soman: Thanks for having me. It was super fun.

Natalia Godyla: Well, we had a great time unlocking insights into security from research to artificial intelligence. Keep an eye out for our next episode.

Nic Fillingham: And don't forget to tweet us @msftsecurity or email us at securityunlocked@microsoft.com with topics you'd like to hear on a future episode. Until then, stay safe,

Natalia Godyla: Stay secure.