Inside Insider Risk

Transcript

Nic Fillingham: Hello and welcome to Security Unlocked, a new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft security engineering and operations teams. I'm Nic Fillingham.

Natalia Godyla: And I'm Natalia Godyla. In each episode, we'll discuss the latest stories from Microsoft Security. Deep dive into the newest threat intel, research and data science.

Nic Fillingham: And profile some of the fascinating people working on artificial intelligence in Microsoft Security.

Natalia Godyla: And now, let's unlock the pod.

Natalia Godyla: Hello Nic, welcome to today's episode, how's it going with you?

Nic Fillingham: Hello Natalia, I'm very well, thank you, I hope you're well, and uh, welcome to listeners, to episode 23, of the Security Unlocked podcast. On the pod today, we have Rob McCann, applied researcher here at Microsoft, working on insider risk management, which is us taking the Security Unlocked podcast into- to new territory. We're in the compliance space, now.

Natalia Godyla: We are, and so we're definitely interested in feedback. Drop us a note at securityunlocked@microsoft.com to let us know whether these topics interested you, whether there is another avenue you'd like us to go down, in compliance. Also always accepting memes.

Nic Fillingham: Cat memes, sort of more specifically.

Natalia Godyla: (laughing)

Nic Fillingham: All memes? Or just cat memes?

Natalia Godyla: Cat memes, llama memes, al-

Nic Fillingham: Alpaca-

Natalia Godyla: ... paca memes.

Nic Fillingham: ... memes. Yeah. Alpaca. Yeah, this is a really interesting uh, topic, so insider risk, and insider risk management is the ability for security teams, for IT teams, for HR to use AI and machine learning, and other sort of automation based tools, to identify when an employee, or when someone inside your organization might be accidentally doing something that is going to create risk for the company, or potentially intentionally uh, whether they have, you know, nefarious or sort of malicious intent.

Nic Fillingham: So, it really- really great conversation we had with- with Rob about what is insider risk, what are the different types of insider risk, how is uh, AI and ML being used to go tackle it?

Natalia Godyla: Yeah, there's an incredible amount of work happening to understand the context, because so many of these circumstances require data from different departments, uh, uniquely different departments, like HR, to try to understand, well is- is somebody about to leave the company, and if so, how is that related to the volume of data that they just downloaded? And with that, on to the pod.

Nic Fillingham: On with the pod.

Nic Fillingham: Welcome to the Security Unlocked podcast, Rob McCann, thank you so much for your time.

Rob McCann: Thank you for having me.

Nic Fillingham: Rob, we'd love to start with a quick intro. Who are you, what do you do? What's your day to day look like at Microsoft, what kind of products or technology do you touch? Give us a- give us an intro, please.

Rob McCann: Well, I've been at Microsoft for about 15 years, I am a- I've been an applied researcher the entire time. So, what that means is, I get to bounce around various products and solve technical challenges. That's the official thing, what it actually means is, whatever my boss needs done, that's a technical hurdle, uh, they just throw it my way, and I have to try to work on that. So, applied scientist.

Nic Fillingham: Applied scientist, versus what's a- what's a different type of scientist, so what- what's the parallel to applied science, in this sense?

Rob McCann: So, applied researcher is sort of a dream job. So, when I initially started, they're sort of the academic style researcher, that it's very much uh, your production is to produce papers and new ideas that sort of in a vacuum look good, and get those out to the scientific community. I love doing that kind of stuff. I don't so much like just writing papers. And so, an applied researcher, what we gotta do, is we gotta sort of be this conduit.

Rob McCann: We get to solve things that are closer to the product, and sort of deliver those into the product. So we get very real, tangible impact, but then we're also very much a bridge. So, part of our responsibility is to keep, you know, fingers on what's going on in the abstract research world and try to foster, basically, a large innovation pipe. So, I freaking love this job. Uh, it's exactly what I like to do. I like to solve hard technical problems, and then I like to ship stuff. I'm a very um ... I need tangible stuff. So I love it.

Nic Fillingham: And what are you working on at the moment, what's the scope of your role, what's your bailiwick? (laughing)

Rob McCann: My bailiwick is uh, right now I'm very much focused on IRM, which is insider risk management, and so what we've been doing over the last year or so, insider risk management GA'd in February of 2020, I want to say. So, Ignite Today is a very festive sort of one year anniversary type thing. That with compliance solutions. So, over this last year, what we've done a lot of is sort of uh, build a team of researchers to try to tackle these challenges that are in insider risk, uh, and sort of bring the science to this brand new product. So, a lot of what I'm doing on a daily basis is on one hand, the one hand is, solve some technical things and get it out there, and the other hand is build a team to strengthen the muscle, the research muscle.

Natalia Godyla: So, let's talk a little bit more about insider risk management. Can you describe how insider risk differs from external risk, and more specifically, some of the risks associated with internal users?

Rob McCann: It's uh, there's some overlap. But it's a lot different than external attack. So, first of all, it's very hard, not saying that external attack is not hard, I- I work with a lot of those people as well. But insiders are already in, right? And they already have permissions to do stuff, and they're already doing things in there. So there's not like, you have a- a ... some perimeter that you can just camp on, and try to get people when they're coming in the front door.

Rob McCann: So that makes it hard. Uh, another thing that makes it hard is the variety of risks. So, different customers have different definitions of risk. So, risk might be um, we might want to protect our data, so we don't want data exfiltrated out of the company. We might want trade secrets, so we don't want people to even see stuff that they shouldn't see. We don't want workplace harassment, uh, we don't want sabotage. We don't want people to come in, and implant stuff into our code that's gonna cause problems later. It's a very broad space of potential risks, and so that makes it challenging as well.

Rob McCann: And then I would say the third thing that makes it very challenging is, what I said, different customers want- have different definitions of risk. So it's not like ... like, I like the contrast to malware detection. So, we have these external security people that are trying to do all this sophisticated machine learning, to have a classifier that can recognize incoming bad code. Right? And sort of when they get that, like, the whole industry is like, "Yes, we agree, that's bad code, put it in Virus Total, or wherever the world wants to communicate about bad code." And it's sort of all mutually agreed upon, that this thing is bad.

Rob McCann: Insider risk is very different. It's um, you know, this customer wants to monitor these things, and they define risk a certain way. Uh, this customer cares about these things, and he want to define risk a certain way. There is a heightened level of customer preferences that have to be brought into the- the intelligence, to- to detect these risks.

Natalia Godyla: And what does detecting one of those risks look like? So, fraud, or insider trading, can you walk through what a workflow would look like, to detect and remediate an insider attack?

Rob McCann: Yeah, definitely. So- so, first of all, since it's such a broad landscape of potential damage, I guess you would say, first thing the product has to do is collect signals from a lot of different places. We have to collect signals about people logging in. You have to collect signals about people uploading and downloading files from a- from OneDrive, you have to ... you have to see what people are sharing on Teams, what people are ec- you know, emailing externally. If you want the harassment angle, you gotta- you know, you gotta have a harassment detector on communications.

Rob McCann: So the first thing is just this huge like, data aggregation problem of this very broad set of signals. So that's one, which in my mind is a- is a very strong advantage of Microsoft to do this, because we have a lot of sources of signals, across all of our products. So, aggregating the data, and then you need to have some detectors that can swim through that, uh, and try to figure out, you know, this thing right here doesn't quite look right. I don't know necessarily that it's bad, but the customer says they care about these kind of things, so I need to surface that to the customer.

Rob McCann: So, uh, technics that we use there a lot are anomaly detection. Uh, so a lot of unsupervised type of learning, just to look for strangeness. And then once we surface that to the- the customer, they have to triage it, right? And they have to look at that and make a decision, did I really- do I really want to take action on this thing? Right? And so, along with just the verdict, like, it's probability 98% that this thing is strange, you also have to have all this explanation and context. So you have to say, why do I think this thing is strange?

Rob McCann: And then you have to pull in all these things, so like, it's strange because they- they moved a bunch of sensitive data around, that- in ways they usually didn't, but then you also need to bring in other context about the user. This is very user-centric. So you have to say things like, "And by the way, this person is getting ready to leave the company." That's a huge piece of context to help them be able to make a decision on this. And then once the customer decides they want to make a decision, then the product, you know, facilitates uh, different workflows that you might do from that. So, escalating a case to legal, or to HR, there are several remediation actions that the customer can choose from.

Nic Fillingham: On this podcast, we've spoken with a bunch of data scientists,

Nic Fillingham: ... and sort of machine learning folks who have talked about the challenge of building external detections using ML, and from what you've just explained, it sounds like you probably have some, some pretty unique challenges here to give the flexibility to customers, to be able to define what risk needs to them. Does that mean that you have to have a customized model built from scratch for every customer? Or can you have a sort of a global model to help with that anomaly detection that then just sort of gets customized more slightly on top based on, on preferences? I, I guess my question is, how do you utilize a tool like machine learning in a solution like this that does require so much sort of customization and, and modification by the, by the customer?

Rob McCann: That's, that's a fantastic question. So, what you tried to do, you scored on that one.

Nic Fillingham: (laughs).

Rob McCann: You try to do both, right? So, customers don't wanna start from scratch with any solution and build everything from the ground up, but they want customizability. So, what you try to do, I always think of it as smart defaults, right? So, you try to have some basic models that sort of do things that maybe the industry agrees is suspicious type, right? And you expose a few high-level knobs. Like, do you care about printing? Or do you care about copying to USB? Or do you want to focus this on people that are leaving the company? Like some very high level knobs.

Rob McCann: But you don't expose the knobs down to the level of the anomaly detection algorithm and how it's defining distance and all the features it's using to define normal behavior, but you have to design your algorithm to be able to respect those higher level choices that the u- that the user made. And then as far as the smart default, what you try to do as you pr- you try to present a product where out of the box, like it's gonna detect some things that most people agree are sort of risky, and you probably wanna take a look at, but you just give the, you offer the ability to customize as, as people wanna tweak it and say, nah, that's too much. I don't like that. Or printing, it's no big deal for us. We do it. We're printing house, right?

Nic Fillingham: Does a solution like this, is it geared towards much larger organizations because they would therefore have more signal to allow you to build a high fidelity model and see where there are anomalies. So, for example, could the science of the insider risk management work for a small, you know, multi hundred, couple hundred person organization? Or is it sort of geared to much, much larger entities, sort of more of the size of a, of a Microsoft where there are tens of thousands employees and therefore there's tens of thousands of types of signal and sort of volume of signal.

Rob McCann: Well, you've talked to enough scientists. I look at your guys's guest list. I mean, you know, the answer, right, more data is better, right? But it's not limiting. So, of course, if you have tons and tons of employees in a rich sorta like dichotomy of roles in the company, and you have all this structure around a large company, if you have all that, we can leverage it to do very powerful things. But if you just have a few hundred employees, you can still go in there and you can still say, okay, your typical employees, they have this kind of activity. Weird, the one guy out of a 100 that's about ready to leave suddenly did something strange, uh, or you can still do that, right? So, you, you got to make it work for all, all spectrums. But more data is always better, man. Um, more signals, more data, bring it on. Let's go. Give me some computers. Let's get this done.

Natalia Godyla: Spoken like a true applied scientist. So, I know that you mentioned that there's a customized components inside of risk management, but when you look across all of the different customers, are you seeing any commonalities? Are there clear indicators of insider threats that most people would recognize across organizations like seeing somebody exfiltrate X volume of data, or a certain combination of indicators happening at once? I'm assuming those are probably feeding your smart defaults?

Rob McCann: Correct. So, there's actually a lot of effort to go. So, I s- I said that we're sort of a bridge between external academic type research and product research. So, that's actually a large focus and it happened in external security too. As you get industry to sort of agree like on these threat matrices, and what's the sort of agreed upon stages of attack or risk in this case. So, yeah, there are things that everybody sort of agrees like, uh, this is fishy. Like, let's make this, let's make this priority. So, that, like you said, it feeds into the smart defaults. The same time we're trying to, you know, we don't think we know everything. So, we're working with external experts. I mean, you saw past podcasts, we talked to Carnegie Mellon, uh, we talked to Mitre, we talked to these sort of industry experts to try to make this community framework or, uh, language and the smart defaults. Uh, and then we try to take what we can do on top of that.

Nic Fillingham: So, Rob, a couple of times now, you've, you've talked about this scenario where an employee's potentially gearing up to leave the, the company. And in this hypothetical situation, this is an employee that may be looking to, uh, exfiltrate some, some data on their way out or something, something that falls inside the scope of, of identifying and managing, uh, insider risk. I wonder, how do you determine when a user is potentially getting ready to leave the company? Is that, do you need sort of more manual signals from like an HR system because an employee might've been placed on a, on a, on a review, in a review program or review period? Or, uh, are you actually building technology into the solution to try and see behaviors, and then those behaviors in a particular sort of, uh, collection in a particular shape lead you to believe that it could be someone getting ready to leave the company? Or is it both or something else?

Rob McCann: So, quick question, Nick, what are you doing after this podcast?

Nic Fillingham: Yeah.

Rob McCann: Do you want a job? Because it feels like you're reading some of my notes here (laughter). Uh, we, uh-

Nic Fillingham: If you can just wait while I download these 50 gigs of files first-

Rob McCann: (laughs).

Nic Fillingham: ... from this SharePoint that, that I don't normally go to, and then I sort of print everything and then I can talk to you about a job. No, I'm being silly.

Rob McCann: No, I mean, I mean, you hit the nail on the head there. It's, uh, there are manual signals. This is the same case with say asset labels, like file labels, uh, highly sensitive stuff and not sensitive stuff. So, in both cases, like we want the clear signals. When the customers use our plugins or a compliance solution to tell us that, you know, here's an HR event that's about ready to happen. Like the person's leaving or this file's important. We are definitely gonna take that and we're gonna use it. But that's sort of like the scientists wanna go further. Like what about the stuff they're not labeling? Does that mean they just haven't got around to it? Or does that mean that it's really not important? Or like you just said, like, this guy is starting to email recruiters a lot, this is like, is he getting ready to leave? So, there's definitely behavioral type detection and inference that, uh, we're working on behind the scenes to try to augment what the users are already telling us explicitly.

Natalia Godyla: So, what's the reality of insider risk management programs? How mature is this practice? Are folks paying attention to insider risk? Is there a gap here or is there still education that needs to happen?

Rob McCann: Yeah. So, there has been people working on this a lot longer than I have, but I do have to say that things are escalating quickly. I mean, especially with modern workforce, right? The perimeter is destroyed and everybody's at home and it's easier to do damage, right? And risk is everywhere, but some, you know, cold, hard numbers, like the number of incidents are going up, b- like, over the last two years. But I think Gardner just come out and said in, in the last two years, the number of incidents have went out by about half. So, the number of incidents are happening more probably, maybe 'cause of the way we work now. The amount of money that people, uh, companies are spending to address this problem is going up. I think Gardner's number was, when, uh, the average went up several million over the last couple of years, um, they just sort of released an insider risk survey and more people are concerned about it. So, all the metrics are pointing up and it just makes sense with the way the world is right now.

Nic Fillingham: Where did sort of insider risk start? What's sort of the, the beginning of this solution... what did the sort of incubation technology look like? Where did it start? Uh, are you able to talk to that?

Rob McCann: I mean, sure. A little bit. So, this was before me, so a lot of this came out of, uh, DSRE, which is our, our sort of internal security team for, at Microsoft babysitting our own network. So, they had to develop tools to address these very real issues, and the guys that I did a podcast with before Tyler Mirror and, and Robin, they, um, they sorta, you know, brought this out and started making it a proper product to take all these technologies that we were using in-house and try to help turn them into a product to help other people. So, it sort of organically grew out of just necessity, uh, in-house. But as far as like industry, like, uh, Carnegie Mellon, uh, certain National Insider Threat Center and I think they've been, uh, studying this problem for over a decade.

Nic Fillingham: And as a solution, as a technical solution, did it start with like, sort of basic heuristics and just looking for like hard coded flags and logs, or did it actually start out as a sort of a data science problem and, you know, the sort of basic models that have gotten more sophisticated over time?

Rob McCann: Yeah. So, it did start, start out with some data science at the beginning as well. Uh, so of course he always have the heuristics. We do that in external attack too. Heuristics are very precise, they, uh, allow us to write down things that are very specific. And they're very, very important part of the arsenal. A lot of people diss on heuristics hero sticks, but it's a very im- very important part of that, that thing. But it also has, it started out with some data science in it, you know, the anomaly detection is a big one. Um, and so there were already some models that they brought right from, uh in-house to detect when stuff was suspicious.

Natalia Godyla: So, what

Natalia Godyla: ... what's the future of IRM look like? What are you working on next?

Rob McCann: Well, I mean, we could, you could go several ways. You know, there could be broadness of different types of risk. The thing that I enjoy the most is sort of the more sophisticated ways of doing newer algorithms, maybe for existing charters, or maybe broad charters.

Rob McCann: Uh, one thing that, I- I'm very interested in lately is the sort of interplay between supervised learning and, and anomaly detection. So you can think of as, uh, semi-supervised. That's a thing that we've actually been playing with at Microsoft for, for a long time.

Rob McCann: I've had this awesome, awesome journey here. I've, I've always been on teams that were sorta, like ... It's kinda like I've been an ML evangelist. Like, I always get to the teams right when they're starting to do the really cool tech, and then I get to help usher that in. So, I got to do that in the past with spam filtering, when that was important. Remember when Bill Gates promised that we were gonna solve spam in a, in two years or whatever. Those were some of the first ML models we ever did i- in Microsoft products, and even back then we're playing with this intersection of, you know, things look strange, but I know that certain spam looks like this, so how do you combine that sort of strangeness into sort of a semi-supervised stuff ...

Rob McCann: That's the stuff that really floats my boat is ho- how do you, how do you take this existing technology that some people think of as very different ... There's unsupervised, there's supervised, uh, there's anomaly detection. How do you take that kinda stuff and get it to actually talk to each other and do something cooler than you could do on one set or the other? That's where I see the future from a technical standpoint behind the scene for smarter detectors, is how we do that kind of stuff.

Rob McCann: Product roadmap, it's related to what we're, we talked about earlier about the industry agreeing on threat major sees and customers telling us what's the most important to them. That, that's stuff's gonna guide, guide the product roadmap. Um, but the technical piece, there's so much interesting work to do.

Natalia Godyla: When you're trying to make a hybrid of those different models, the unsupervised and supervised machine learning models, what are you trying to achieve? What are the benefits of each that you're trying to capture by combining them?

Rob McCann: Oh, it's the story of semi-supervised, right? I have tons and tons of data that can tell me things about the distribution of activity, I just o-, d-, only have labels on a little bit of it. So, how do I leverage the distributions of activity that's unlabeled with the things that I can learn from my few labeled examples? And how do I get those two things to make a better decision than, than either way on its own?

Rob McCann: It's gonna be better than training on just a few things in a supervised fashion, 'cause you don't have a lot of data with labels. So you don't wanna throw away all that distributional information, but if you go over to the distributional information, then you might just detect weirdness. But you never actually get to the target which is risky weirdness, which is two different things.

Nic Fillingham: Is the end goal, though, supervised learning, so if you, if you have unsupervised learning with a small set of labels, can you use that small set of labels to create a larger set of labels, and then ultimately get to ... I'm horribly paraphrasing all this here, but, is that sort of the path that you're on?

Rob McCann: So, we're gonna try to make the best out of the labels that we can get, right? But, I don't think you ever throw away the unsupervised side. Because, uh, I mean, this c-, this has come up in the external security stuff, as well, is if you're always only learning how to catch the things that you've already labeled, then you're never gonna really s-, be super good at detecting brand new things that you don't have anything like it. Right?

Rob McCann: So, you have to have the ... It's sorta like the Explore-exploit Paradigm. You can think of it, at a very high level you can think of supervised as you're exploiting what you already know, and you're finding stuff similar to it. But the explore side is like, "This thing's weird. I don't know what it is, but I wanna show it to a person and see if they can tell me what it is. I wanna see if they like that kinda stuff."

Rob McCann: Uh, that's sorta synergy. That's, that's a powerful thing.

Nic Fillingham: What's the most sophisticated thing that the IRM solution can do? Like, have you been sort of surprised by the types of, sort of, anomalies that can be both detected and then sort of triaged and then flagged, or even have automated actions taken? Is there, is there a particular example that you think is a paramount sort of example of what, what this tech can do?

Rob McCann: Well, it's constantly increasing in complexity. First of all, anybody who's done applied science knows how hard it is to get data together. So when I work with the IRM team, first of all, I'm blown away at the level of the breadth of signals they've managed to put together into a place that we can reason over. That is such a strong thing. So the, their data collection is super strong. And they're always doing more. I mean, these guys are great. If I come up with an idea, and I say, "Hey, if we only had these signals," they'll go make it happen. It is super, super cool.

Rob McCann: As far as sophistication, I mean, you know, we start, we start with heuristics, and then you start doing, like, very obvious anomaly detection, like, "Hey, these, this guy just blew us out of the water by copying all these files." I mean, that's sort of the next level. And then the next level is, uh, "Okay, this guy's not so obvious. He tries to fly under the radar and sort of stay low and slow. But can we detect an aggregate? Over time he's doing a lot of damage." So those more subtle long-term risks. That's actually something we're releasing right now.

Rob McCann: Another very powerful paradigm that we're releasing right now is, not just individual actions, but very precise sequences of actions. So you could think of it in a external as kill chain. Like, "They did this, and then they did this, and then they did this." That can be much more powerful than, "They did all three of those separately and then added together," if you know what I mean.

Rob McCann: So that sort of interesting sequences thing, that's a very powerful thing. And once you sorta got these frameworks up, like, you can get arbitrarily sophisticated under the hood. And so, it's not gonna stop.

Nic Fillingham: Rob, you talked about working on spam detection and spam filters as previous sort of projects you were working on. I wonder if you could tell us a little bit about that work, and I wonder if there's any connective tissue between what you did back then and, and IRM.

Rob McCann: Yeah, so I've worked on a lot more than spam. So, I got hired to do spam, to do the research around the spam team, but it quickly, uh, it was this newfangled ML stuff that we were doing, and, uh, it started working on lots of different problems, if you can imagine that. And so we started working on spam detection, and, and phish detection. We started working on Microsoft accounts. We would, we would look at how they behave and try to detect when it looks like suddenly they've been compromised, and help people, you know, sort of lock down their accounts and get, and get protection.

Rob McCann: All those things it's been cool to watch. We sorta, we sorta had a little incubation-like science team, and we would put these cool techniques on it and it would start working well, and then they've all sort of branched out into their own very mature products over the years. A- and they're all based very heavily on, uh, the sort of techniques that, that have worked along the way.

Rob McCann: It's amazing how much reuse there is. I mean, I mean, let's boil down what we do to just finding patterns in data that support a business objective. That's the same game, uh, in a lot of different domains. So, yes, of course, there's a lot of overlap.

Nic Fillingham: What was your first role at Microsoft? Have you always been in, in research on applied research?

Rob McCann: I have always been a spoiled brat. I mean, I, I just get to go work on hard problems. Uh, I don't know how I've done it, but they just keep letting me do it, and it's fun. Uh, yeah, I've always been an applied researcher.

Nic Fillingham: And that, you said you joined about 14 years ago?

Rob McCann: Yep. Yep, yep. That was even back before, uh, the sort of cluster machine learning stuff was hot. So we, I mean, we used to, we used to take, uh, lots of sequel servers and crunch data and get our features that way, and then feed it into some, like, single box, uh, learning algorithms on small samples. And, like, I've got to see this progression to, like, distributed learning over large clusters. In-house first, we used to have a system called [Cosmos In-House 00:28:04]. I actually got to write some of the first algorithms that did machine learning on that. It was super, super rewarding. And now we have all this stuff that we release to the public and Azure's this big huge ... It's a very, very cool to have seen happen.

Nic Fillingham: Giving the listener maybe a, uh, a reference point for, for your entry into Microsoft-

Rob McCann: (laughs)

Nic Fillingham: ... is there anything you worked on that's either still around, or that people would have known? I think, like, just the internal Cosmos stuff is, is certainly fascinating. I'm just wondering if there's a, if there's a touchstone on the product side.

Rob McCann: Spam filtering for Hotmail. That was my first gig.

Nic Fillingham: Nice! I, I cut my teeth on Hotmail.

Rob McCann: Yeah, yeah-

Nic Fillingham: Yeah, I was a Hotmail guy. I was working on the Hotmail team as we transitioned to Outlook.com.

Rob McCann: Mm-hmm (affirmative).

Nic Fillingham: And I was, uh, down in Palo Alto, I can't even remember. I was somewhere, where- wherever the Silicone Valley campus is-

Rob McCann: SVC-

Nic Fillingham: We were rolling like a boar-, a boardroom waiting for the new domain to go live, and we got, like, a 15 minute heads-up. So I'm just Nic@Outlook.com. That's, that's my email address, and I got, I got my wife her first name at Outlook.com.

Nic Fillingham: Were you there for that, Rob? Do you have a, did you get a super secret email address?

Rob McCann: I was not there for the release, but as soon as it was out, I went and grabbed some for my kids. So I w-, I keep my Hotmail one, 'cause I've had it forever, but, uh-

Nic Fillingham: Yeah.

Rob McCann: ... I got all my kids, like, the, the ones they needed. So.

Rob McCann: It's amazing how much stuff came out of the, that, that service right there. So I talked about identity management that we do for Microsoft accounts now. I, that stuff came from trying to protect people, their Hotmail accounts. So we would build models to try to determine, like, "Oh, this guy's suddenly emailing a bunch of people that he doesn't usually," anomaly detection, if you can imagine, right? The-

Nic Fillingham: Yeah-

Rob McCann: ... same thing works.

Rob McCann: All that stuff, and then it sorta grew in, and then Microsoft had a bigger account, and then that team's kinda like, "Hey, you guys are doing this ML to detect account compromise, can you come, like, do some

Rob McCann: ... of that over here," and then it grew out to what it is today. A lot of things came from the OML days, it was very fun.

Natalia Godyla: Thinking of the different policies organizations have and the growing awareness of those policies, over time, employees are going to shift their tactics. Like you said there are some who are already doing low and slow activities that are evading detection, so, how do you think this is going to impact the way you try to tackle these challenges, or have you already noticed people try to subvert the policies that are in place?

Rob McCann: Yeah, so that's the, that's the next frontier, which is w-, you know, why I said we started just getting into, like, the low and slow stuff. It's gonna be like all other security, it's gonna be, "These guys are watching this thing, I gotta try something different."

Rob McCann: Actually that's a good motivation for the sort of the high-level approach we're taking, which is tons of signals, so there's not very many activities you could do. You could print, copy to USB, you could upload to something, you could get a third-party app that does the uploading for you. There's not very many avenues that you could do that we're not gonna be able to at least see that happening.

Rob McCann: So you couple that with some, that mountain of data with some algorithm that can try to pick out, "This is a strange thing, and this is in the context of somebody leaving." It's gonna be an interesting cat-and-mouse, that's for sure.

Natalia Godyla: Do you have any examples of places where you've already had to shift tactics because you're noticing a user try to subvert the existing policies? Or are you still in the exploration phase trying to figure out what really, what this is really going to look like next?

Rob McCann: So, right now I don't think we've had ... We haven't got to the phase yet where we're affecting people a lot. Uh, this is very early product, we're a year in. So, I don't see the reactions yet, but I, I guarantee it's gonna happen. And then we're gonna learn from that, and we're gonna say, "Okay, I have the Explore-exploit going. The Explorer just told me that something strange that I've never seen before happened." We're gonna put some people on that that are experts that figure out what that's gonna be. We're gonna figure out how to bring that into the fold of agreed-upon bad stuff, so we're gonna expand this threat matrix, right, as we go along? And we're gonna keep exploring. And that's the same for every single security product.

Nic Fillingham: Rob, as someone that's been able to sort of come into different teams and, and different solutions and, and help them, as you say, sort of bring more academic or theoretical research into, into product, what techniques are you keeping your eye on? Like, what's, what's coming in the next two or three years, maybe not necessarily for IRM, maybe just in terms of, as machine learning, as sort of AI techniques are evolving and, and, and sort of getting more and more mature, like, what, where are you excited? What are you, what are you looking at?

Rob McCann: So you want the secret sauce, is what you're asking for?

Nic Fillingham: That's exactly what I want. I want the secret sauce.

Rob McCann: (laughs) Um, well, I mean, there's two schools of thought. There's one school of thought which is, "You better keep your finger on the pulse, because the, the new up-n-comers, the whippersnappers are gonna bring you some really cool, cool stuff." And then there's the other school of thought which is, "Everything they've brought in the last ten years is a slight change of what they, was before, the previous ... It's a cycle, right, as with s-, i- ... Science is refinement of existing ideas.

Rob McCann: So, I'm a very muted person that way, in that I don't latch on to the next latest and greatest big thing. Um, but I do love to see progress. I s-, just see it as more of a multi-faceted gradual rise of mankind's pattern-recognition ability, right?

Rob McCann: Things that excite me are things that deal with ... Like, big data with big labels? Super, super cool stuff happening there. I mean, like, you know, who doesn't like the word deep learning, or have used it-

Nic Fillingham: What's a big label? Is there a small label?

Rob McCann: (laughs) No, I mean lots of labeled data. Like, uh-

Nic Fillingham: Okay.

Rob McCann: ... yes.

Nic Fillingham: Big data sets, lots of labels.

Rob McCann: Yes. That stuff, um, that's exciting. There's a lot of cool stuff we couldn't do two decades ago that are happening right now, and that's very, very powerful.

Rob McCann: But a lot of the business problems in security, especially, 'cause we're trying to always get this new thing that the bad guys are doing that we haven't seen before. It's very scarce label-wise. And so the things that excite me are how you inject domain knowledge, right? I talked about, we want customers to be able to sort of control on some knobs that you, like, focus the thing on what they think's important.

Rob McCann: But it also happens with security analysts, because, there's a lot of very smart people that I get to work with, and they have very broad domain knowledge about what risks look like, and various forms of security. How do you get these machines to listen to them, more than them just being a label machine? How do you embed that domain knowledge into there?

Rob McCann: So there's a lot of cool stuff happening. Uh, in that space, weak learning is one that's very popular. Came out of Stanford, actually. But I'm very la-, I'm very, very excited about what we can do with one-shot, or weak supervision, or very scarce labeled examples. I think that's a very, very powerful paradigm.

Nic Fillingham: Doing more with less.

Rob McCann: That's right.

Rob McCann: And transfer learning, I'm sure you guys have talked to a lot of people about that. That's another one. A lot of things we do in IRM ... Well, in, in lots of security is you try to, like, leverage labeled, uh, supervised classification ... Like, think about HR events.

Rob McCann: So, maybe I could, don't have a m-, a bunch of labeled, "These are IRM incidents" that I can train this big supervised classifier on. But what I can do is I can get a bunch more HR events, and I can learn things, like you said, that predict that an HR event is probably happening, right? And I chose that HR event, because that's correlated with the label I care about, right? So, I can use all that supervised machinery to try to predict that proxy thing, and then I can try to use what it learned to get me to what I really want with maybe less labels.

Nic Fillingham: Got it. My final IRM question is, from what I know about IRM, it feels like it's about protecting the organization from an employee who may maliciously or accidentally do something they're not meant to do. And we've used the example of an employee getting ready to leave the company.

Nic Fillingham: What about, though, IRM as a tool to spot well-meaning, but, but practices that, that o-, expose the company to risk? So instead of, like, looking for the employee that's about to leave and exfil 50 gigs of cat meme data that they shouldn't, what about, like, just using it to identify, "You know what, this team's just sort of got some sloppy practices here that's sort of opening us for risk. We can use the IRM tool to go and find the groups that need the, sort of the extra training, and to, need to sort of bring them up to scratch. And so it's almost more of a, um, just thinking of it more in sort of a positive reinforcement sense, as opposed to sort of an avoiding a negative consequence.

Nic Fillingham: Is that a big function of IRM?

Rob McCann: Yeah, I mean, I, I'm sorry if I didn't, uh, communicate that well, but, IRM is definitely intentional and unintentional. In s-, in some of the workflows the way you can do when we detect risky activity is just send an email to the, uh, to the employee and say, "Hey, this behavior is risky, change your ways, please," right?

Rob McCann: So, you're right, it's, it can be a coaching tool as well, it's not just, "Data's gonna leave," right? Intentionally.

Nic Fillingham: Got it. You've been very generous. This has been a great conversation. I wondered, before you leave us, do you have anything you would like to plug? Do you have a blog, do you have a Twitter? Is there a- another podcast? Which one were you on, Rob?

Rob McCann: Uncovering Hidden Risk. I would also like to point you guys to, uh, an inside risk blog. I mean, we, we publish a lot on, on what's coming out and where the product is headed, so it's: aka.ms/insiderriskblog. That's a great place to sorta keep abreast on the technologies and, and where we wanna go.

Nic Fillingham: That sounds good. Well, Rob McCann, thank you so much for your time. Uh, this has been a great conversation, um, we'll have to have you back on at some point in the future to learn more about weak learning and other th-, other sort of, uh, cool new technique you hinted at.

Rob McCann: Yeah. I appreciate it. Thanks for having me.

Rob McCann: (music)

Natalia Godyla: Well, we had a great time unlocking insights into security from research to artificial intelligence. Keep an eye out for our next episode.

Nic Fillingham: And don't forget to tweet us @msftsecurity or email us at securityunlocked@microsoft.com with topics you'd like to hear on a future episode.

Nic Fillingham: Until then, stay safe.

Natalia Godyla: Stay secure.

HOST(S):

Nic Fillingham likes to ask questions and find out how stuff works. For over 15 years Nic has worked at Microsoft on Xbox, Windows, developer tools, Microsoft 365 and Security. A transplant from Australia, Nic lives just outside of Seattle on a small farm with his family and too many guitars.

Natalia Godyla is an award-winning B2B product marketer and speaker, currently in the Security Product Marketing group at Microsoft. She specializes in cybersecurity marketing and has a Sec+ certification. Fun fact: Natalia is also a published poet and founder of Rebel Data.

Schedule: Wednesdays

Credits: Executive Producer is Bruce Bracken, Producer is Rob Petrillo, Production Manager is Max Solomon, and our Audio Engineer (and magician) is none other than The Great Rich Cerbini.

Creator: Microsoft