Identity Threats, Tokens, and Tacos
Nic Fillingham: Hello, and welcome to Security Unlocked, a new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft security engineering and operations teams. I'm Nic Fillingham.
Natalia Godyla: And I'm Natalia Godyla. In each episode, we'll discuss the latest stories from Microsoft security, deep dive into the newest threat intel, research, and data science.
Nic Fillingham: And profile some of the fascinating people working on Artificial Intelligence in Microsoft security.
Natalia Godyla: And now, let's unlock the pod.
Nic Fillingham: Hello, Natalia. Welcome to episode 20 of Security Unlocked. This is, uh, an interesting episode. People may notice that your voice is absent from the... This interview that we had with Maria Puertos Calvo. How, how you doing? You okay? You feeling better?
Natalia Godyla: I am, thank you. I'm feeling much better, though I am bummed I missed this conversation with Maria. I had so much fun talking with her in episode eight about tackling identity threats with AI. I'm sure this was equally as good. So, give me the scoop. What did you and Maria talk about?
Nic Fillingham: It was a great conversation. So, you know, this is our 20th episode, which is kind of crazy, of Security Unlocked, and we get... We're getting some great feedback from listeners. Please, send us more, we want to hear your thoughts on the... On the podcast. But there've been a number of episodes where people contact us afterwards on Twitter or an email and say, "Hey, that guest was amazing," you know, "I wanna hear more." And Maria was, was definitely one of those guests who we got feedback that they'd love for us to invite them back and learn more about their story. So, Maria is on the podcast today to tell us about her journey into security and then her path to Microsoft. I won't give much away, but I will say that, if you're studying and you're considering a path into cyber security, or you're considering a path into data science, I think you're gonna really enjoy Maria's story, how she sort of walks through her academia and then her time into Microsoft. We talk about koalas and we talk about the perfect taco.
Natalia Godyla: Yeah, to pair with the guac which she covered the first time around. Now tacos. I feel like we're building a meal here. I'm kind of digging the idea of a Security Unlocked recipe book. I, I think we need some kind of mocktail or cocktail to pair with this.
Nic Fillingham: Yeah, I do think two recipes might not be enough to qualify for a recipe book.
Natalia Godyla: Yeah, I mean, I'm feeling ambitious. I think... I think we could get more recipes, fill out a book. But with that, I, I cannot wait to hear Maria's episode. So, on with the pod?
Nic Fillingham: On with the pod.
Nic Fillingham: Maria Puertos Calvo, welcome back to the Security Unlocked podcast. How are you doing?
Maria Puertos Calvo: Hi, I'm doing great, Nic. Thank you so much for having me back. I am super flattered you guys, like, invited me for the second time.
Nic Fillingham: Yeah, well, thank you very much for coming back. The episode that we, we, we first met you on the podcast was episode eight which we called Tackling Identity Threats With AI, which was a really, really popular episode. We got great feedback from listeners and we thought, uh, let's, let's bring you back and hear a bit more about your, your own story, about how you got into security, how you got into identity, how you got into AI. And then sort of how you found your way to Microsoft.
Nic Fillingham: But since we last spoke, I want to get the timeline right. Did you have twins in that period of time or had the twins already happened when we spoke to you in episode eight?
Maria Puertos Calvo: (laughs) No, the twins had already happened. They-
Nic Fillingham: Got it.
Maria Puertos Calvo: I think it's been a few months. But they're, they are nine, nine months old now. Yeah.
Nic Fillingham: Nine months old. And, and the other interesting thing is you're now in Spain.
Maria Puertos Calvo: Yes.
Nic Fillingham: When we spoke to you last, you were in the Redmond area or is that right?
Maria Puertos Calvo: Yes, yes. The... Last time when we, we spoke, I, I was in Seattle. But I was about to make this, like, big trip across the world to come to Spain and, and the reason was, actually, you know, that the twins hadn't met my family. I am originally from Spain, and, and my whole family is, is here. And, you know, because of COVID and everything that happened, they weren't able to travel to the US to see us when they were born. So, my husband and I decided to just, like, you know, do a trip and take them. And, and we're staying here for a few months now.
Nic Fillingham: That's awesome. I've been to Madrid and I've been to... I think I've only been to Madrid actually. Where, where... Are you in that area? What part of Spain are you in?
Maria Puertos Calvo: Yes, yes. I'm in Madrid. I'm in Madrid. I, I'm from Madrid.
Nic Fillingham: Aw- awesome. Beautiful city. I love it. So, obviously, we met you in episode eight, but if you could give us, uh, a little sort of mini reintroduction to who you are, what's your job at Microsoft, what does your... What does your day-to-day look like, that'd be great.
Maria Puertos Calvo: Yeah. So, I am the lead data scientist in identity secure and protection, identity security team who... We are in charge of making sure that all of the users who use, uh, Microsoft identity services, either Azure Active Directory or Microsoft account, are safe and protected from malicious, you know, uh, cyber criminals. So, so, my team builds the algorithms and detections that are then put into, uh, protections. Like, for example, we build machine learning for risk based authentication. So, if we... If our models think an authentication is, is probably compromised, then maybe that authentication is challenged with MFA or blocked depending on the configuration of the tenet, et cetera.
Maria Puertos Calvo: So, my team's day-to-day activities are, you know, uh, uh, building new detections using new data sets across Microsoft. We have so much data between, you know, logs and APIs and interactions b- between all of our customers with Microsoft systems. Uh, so, so, we analyze the data and, and we build models, uh, apply AI machine learning to detect those bad activities in the ecosystem. It could be, you know, an account compromised a sign-in that looks suspicious, but also fraud. Let's say, like, somebody, uh, creates millions of spammy email addresses with Microsoft account, for example to do bad things to the ecosystem, we're also in charge of detecting that.
Nic Fillingham: Got it. So, every time I log in, or every time I authenticate with either my Azure Active Directory account for work or my personal Microsoft account, that authentication, uh, event flows through a set of systems and potentially a set of models that your team owns. And then if they're... And if that authentication is sort of deemed legitimate, I'm on my way to the service that I'm accessing. And if it's deemed not legitimate, it can go for a challenge through MFA or it'll be blocked? Did, did I get that right?
Maria Puertos Calvo: You got that absolutely right.
Nic Fillingham: So, that means... And I think we might've talked about this on the last podcast, but I still... I... As a long-term employee of Microsoft, I still get floored by the, the sheer scale of all this. So, there's... I mean, there's hundreds of millions of Microsoft account users, because that's the consumer service. So, that's gonna be everything from X-Box and Hotmail and Outlook.com and using the Bing website. So, that's, that's literally in the hundreds of millions realm. Is it... Is it a billion or is it... Is it just hundreds of millions?
Maria Puertos Calvo: It depends on how you count them. Uh, if it's per day, it's hundreds of millions, per month I think it's close to a billion. Yes, for... Of users. But the number of authentications overall is much higher, 'cause, you know, the users are authenticating in s- in s- many cases, many, many times a day. A lot of what we evaluate is not only, like, your username and password authentications, there's also the, you know, the model authe- authentication particles that have your tokens cash in the application and those come back for request for access. So, the... We evaluate those as well.
Maria Puertos Calvo: So, it's, uh... It's actually tens of billions of authentications a day for both the Microsoft account system and the Azure Active Directory system. Azure Active Directory is also a... Really big, uh, it's almost... It's, it's getting really close to Microsoft account in terms of monthly, monthly active users. And actually, this year, with, you know, COVID, and everybody, you know, the... All the schools, uh, going remote and so many people going to work from home, we have seen a huge increase in, in, in monthly active users for Azure Active Directory as well.
Nic Fillingham: And do you treat those two systems separately? Uh, or, or are they essentially the same? It's the same anomaly detection and it's the same sort of models that you'd use to score and determine if a... If an authentication attempt is, is, uh, is legitimate or, or otherwise?
Maria Puertos Calvo: It's, like, theoretically the same. You know, like, we, we use the same methodology. But then there are different... The, the two systems are different. They live in different places with different architectures. The data that is logged i- is different. So, these, these were initially not, you know... I- identity only, uh, took care of those two systems, like, a few years ago, before they w- used to be owned by different teams. So, the architecture underneath is still different. So, we still have to build different models and maintain them differently and, you know, uh, uh, tune them differently. So, so it is more work, but, uh, the, the theory and the idea, their... How we built them is, is very similar.
Nic Fillingham: Are there some sort of trends that have, you know, appeared, having these two massive, massive systems sort of running in parallel but with the same sort of approach? What kind of behaviors or what kind of anomalies do you see detected in one versus the other? Do they sort of function sort of s- similar? Like, similar enough? Or do you see some sort of very different anomalies that appear in one system and, and not another.
Maria Puertos Calvo: They're, interestingly, pretty different. Uh, when we see attack spikes and things like that, they don't always reflect one or the other. I think the, the motivation of the people that attack enterprises and organizations, it's, it's definitely from the, the hackers that are attacking consumer accounts. I think they're, you know, they're so in the black market separately, and they're priced separately, you know, and, and differently. And I think they're, they're generally used for different purposes. We see sometimes spikes in correlation, but, but not that much.
Nic Fillingham: Before we sort of, uh, jump in to, to your personal story into security, into Microsoft, into, into data science, is the... You know, these... Talking about these sheer numbers, talking about the hundreds of millions of, of authentications, I think you said, like, tens of billions that are happening every day. Is that a dream for a data scientist to just have such a massive volume of data and signals at your fingertips that you can use to go and build models, train models, refine models? Is that, you know... Is this adage of more signal equals better, does that apply? Or at some point do you now have challenges of too much signal and you're now working on a different set of problems?
Maria Puertos Calvo: That's a great question. It is an absolute dream and it's also a nightmare. (laughs) So, yeah. It is... It... And I'll tell you why for both, right? Like, a... It is a great dream. Like, obviously, you bet... The, the sheer scale of the data, the, you know, the, the fact... There are a lot of things that are easier, because sometimes when you're working with data and statistics, you have to do a lot of things to estimate if,
Maria Puertos Calvo: ... it's like the things that you're competing are statistically significant, right? Like, do I have enough data to approach that this sample, it's going to be, uh, reflection of reality, and things like that. With the amount of data that we have, with the amount of users that we have, it's the, we don't have that, we, we don't really have that problem, right? Like we are able to observe, you know, the whole rollout without having to, to figure out if what we're seeing, you know, it's similar to the whole world or not.
Maria Puertos Calvo: So that's really cool. Also, because we're, you know, have so many users, then we also have, you know, we're a big focus for attackers. So, so we can see everything, you know, that happens in, in, in the cybersecurity world and like the adversary wall, we can find it in, in our data. And, and that is really interesting. Right. It's, it's really cool.
Nic Fillingham: That sounds fascinating. But let, let, let's table that for a second. 'Cause I'd love to sort of go back in time and I'd love to learn about your journey into security, into sort of computer science, into tech, where did it all start? So you grew up in Madrid, is that right?
Maria Puertos Calvo: Yes. I grew up in Madrid and when I was finishing high school and I was trying to figure out like, why do I do, I just decided to study telecommunication engineering, it's what's called a Spain, but it's ev- you know, the, the equivalent who asked degrees electrical engineering. Because I was actually, you know, really, really interested in math and science and physics. They were like my favorite subjects in high school. I was pretty, really good at it actually.
Maria Puertos Calvo: And, but at the same time, I was like, well, this, you know, an engineering degree sounds like something that I could apply all of this to. And the one that seems like the coolest and the future and like I, I, is electrical engineering. Like I, at that time, computer science was also kind of like my second choice, but I knew that in electrical engineering, I could also learn a lot of computer science.
Maria Puertos Calvo: It w- it has like a curriculum that includes a lot of computer science, but also you learn about communication theory and, you know, things like how do cell phones work? And how does television work? And you can learn about computer vision and image processing and all, all kinds of signal processing. I just found it fascinating.
Maria Puertos Calvo: So, so I, I started that in college and then when I finished college, it was 2010. So it was right in the middle of the great recession, which actually hits Spain really, really, really badly when it came to the, the labor market, the unemployment back then, I think it was something like 25%-
Nic Fillingham: Wow.
Maria Puertos Calvo: ... and people who were getting out of school, even in engineering degrees, which were traditionally degrees that would have, you know, great opportunities. They were not really getting good jobs. People, only consulting firms were hiring them, um, and, and really paying really, really little money. It was actually pretty kind of a shame. So I said, what, what, what should I do? And I, I had been a good student during college, so, and I had a professor that, you know, he, that I had done my kind of thesis with him and his research group.
Maria Puertos Calvo: And he said, "Hey, why didn't you just like, continue studying? Like, you can actually go for your PhD and, because you have really good grades, I'm sure you can just get it full of finance. You can get a scholarship that will like finance, you know, four years of PhD. And you know, that way you don't have to pay for your studies, but also you kind of like, you're like a researcher and you have, uh, like money to live." And I was like, well, that sounds like a really good plan.
Nic Fillingham: Sounds good.
Maria Puertos Calvo: Like I actually, yeah. So, so I could do in that. And, and I, you know, then my master said, this masters say, wasn't computer science, but it was very pick and choose, right? Like, like you could pick your branch and what classes you took. And so the master's was the first half of the PhD was basically getting all your PhD qualifying courses, which also are equivalent to, to doing your masters.
Maria Puertos Calvo: So I picked kind of like the artificial intelligence type branch, which had a lot of, you know, classes on machine learning and learn a lot of things that are apply that are user apply machine learning, it's like, uh, natural language processing and speech and speaker recognition and biometrics and computer vision. Basically, all kinds of fields of artificial intelligence, where, where in the courses that I took. And, and I really, really fou- found it fascinating. There wasn't, you know, a data science degree back then, like now everybody has a data science degree, but this is like 10 years ago. Uh, at least, you know, in Spain, there wasn't a data science degree.
Maria Puertos Calvo: But this is like the closest thing, uh, that, and that was my first contact with, uh, you know, artificial intelligence and machine learning. And I, I loved it. And, and then I did my masters thesis on, uh, kind of like, uh, biometrics in, in terms of applying statistical models to forensic fingerprints to, to understand if a person can be falsely, let's say, accused of a crime because their fingerprint brand only matches a fingerprint that is found in a crime scene.
Maria Puertos Calvo: So kind of try to figure out like, how likely is that. Because there have been people in the past that having wrongly convicted, uh, because of their fingerprints have been found in a crime scene. And then after the fact they have found the right person and then, you know, like, uh, it's not a very scientific method, what is followed right now. So that, that was a really cool thing too, that then I never did anything related to that in my life, but, but it was a very cool thing to study when I was in, in school.
Nic Fillingham: Well, that, that's fair. I've, I've got some questions about that. That's fascinating. So how did you even stumble upon that as a, as a, as a, as a research focus? Was there a, a particular case you might've read in the, in the news or something like, I, I think I've never heard of people being falsely accused or convicted through having the same fingerprints, I guess, unless you're an identical twin.
Maria Puertos Calvo: Mm-hmm (affirmative). (laughs) Actually, I can tell you because I have identical twins, but also that, because I studied a lot of our fingerprints is that identical twins do not have the same fingerprints.
Nic Fillingham: Wow.
Maria Puertos Calvo: Uh, because fingerprints are formed when you're in the womb. So they're not, they're not like a genetic thing. They happen kind of like, as a random pattern when, when your body is forming in the womb, and they happen, they're different. Uh, so, so humans have unique fingerprints and that's true, but the problem with the, the finger frame recognition is that, it's very partial, and is very imperfect because the, the late latent, it's called the latent fingerprint, the one that is found in a crime scene is then recovered, you know, using like some powder, and it's kind of like, you, you just found some, you know, sweaty thing and a surface, and then you have to lift that from there. Right.
Maria Puertos Calvo: And, and that has imperfections in, and it only, it's not going to be like a full fingerprint. You're going to have a partial fingerprint. And then, then you, basically, the way the matching works is using this like little poin- points and, and bifurcations of the riches that exist in your fingerprint. And, and then, you know, looking at the, the location and direction of those, then they're matched with other fingerprints to understand if they're the same one or not. But the, because you don't have the full picture, it is possible that you make a mistake.
Maria Puertos Calvo: The one case that it's been kind of really, really famous actually happened with the Madrid bombings that happened in 2004, where, you know, they, they blew up, uh, some trains and, and a couple of hundred people died. Then they, they actually found a fingerprint in one of the, I don't remember, like in the crime scene and it actually match in the FBI fingerprint database. It matched the fingerprint of a lawyer from Portland, Oregon, I believe it's what it was. And then he was initially, you know, uh, I don't know if you ended up being convicted, but, but you know, it wasn't-
Nic Fillingham: He was a suspect.
Maria Puertos Calvo: ... it was a really famous case. Yes. I think he was initially convicted. And then, but then he was not after they found the right person and they, they actually found that yeah, both fingerprints, like the, the guy whose fingerprint it really was. And these other guys, they, their fingerprints both match the crime scene fingerprint, but that's only because it was only a piece of it. Right. You, you don't put your finger, like, you don't roll it left to right. Like when you arrive at the airport, right. That they make you roll your finger, and lay have the whole thing it's, you're maybe just, you know, the, the, the criminal fingerprint is, is very small.
Nic Fillingham: Was that a big part of the, the research was trying to understand how much of a fingerprint is necessary for a sort of statistically relevant or sort of accurate determination that it belongs to, to the, to the right person?
Maria Puertos Calvo: Yeah. So the results of the research they'd have some outcome around, like, depending on how many of those points that are used for identification, which are called minutia, depending on how, how many of those are available, it changes the probability of a random match with a random person, basically. So the more points you have, the less likely it is that will happen.
Nic Fillingham: The one thing, like, as, as we're talking about this, that I sort of half remember from maybe being a kid, I don't know, growing up in Australia is don't koalas have fingerprints that are the same as humans. Did I make that up? Do you know anything about this?
Maria Puertos Calvo: (laughs) I'm sure, I have no idea. (laughs) I have never heard such a thing.
Nic Fillingham: I have a-
Maria Puertos Calvo: Now I wanna know.
Nic Fillingham: ...I'm gonna have to look this up.
Maria Puertos Calvo: Yeah.
Nic Fillingham: I have a feeling that koa- koalas, (laughs) have fingerprints that are either very close to or indistinguishable from, from humans. I'm gonna look this one up.
Maria Puertos Calvo: I wonder if like a koala could ever be wrongly convicted of a crime.
Nic Fillingham: Right, right. So like, if I want to go rob a bank in Australia, all I need to do is like, bring a koala with me and leave the koala in the bank after I've successfully exited the bank with all the gold bars in my backpack. And then the police would show up and they arrest the koala and they'd get the fingerprints and they go, well, it must be the koala.
Maria Puertos Calvo: Exactly.
Nic Fillingham: This is a foolproof plan.
Maria Puertos Calvo: (laughs)
Nic Fillingham: I'm glad I discussed this with you on the podcast. Thank you, Marie, for validating my poses.
Maria Puertos Calvo: Now, now you can't publish this.
Nic Fillingham: Oh, we talked about fingerprints. Oh, crumbs you're right. Yeah. Okay. All right. We have to edit this out of the, (laughs) out of there quick.
Maria Puertos Calvo: (laughs)
Nic Fillingham: Um, okay. I didn't realize we had talked so much about fingerprints. That's my fault, but I found that fascinating. Thank you. So what happens next? Do you then go to Microsoft? Do you come straight out of your education at university in Madrid, straight to Microsoft?
Maria Puertos Calvo: Kind of and no. So what happens next is that while I, I finished the master's part of this PhD, and at this time I'm actually dating my now husband, and he's an American, uh, working in Washington D.C. as an electrical engineer. So I, you know, I finished my master's and my, I say, why, why do I kind of wanna go be in the US uh, so I can be with him. And, you know, I have the space, the scholarship they'll actually lets me go do research abroad and you know, like kind of pays for it. So
Maria Puertos Calvo: Find, um, another research group in the University of Maryland, College Park, which is really, really close to, to DC. And, and I go there to do research for, uh, six months. So, I spent six months there also doing research. Uh, also using, uh, machine learning for, for a different around iris recognition. And, you know, the six months went by and I was like, "Well, I want to stay a little longer," like, "I, you know, I really like living here," and I extended that, like, another six months. I... And at that point, you know, I wasn't really allowed to do that with my scholarship, so I just asked my professor to, you know, finance me for that time. And, and, uh, and at that time, I decided, like, you know, I, I actually don't think I wanna, like, pursue this whole PHD thing.
Maria Puertos Calvo: So, so I stayed six more months working for him, and then I decided I, I, I'm not a really big fan of academia. I went into research in, in grad school in Spain mostly because there weren't other opportunities. I was super, you know, glad I did 'cause I, I love all the research and the knowledge that I gained with all... You know, with my master's where I learned everything about Artificial Intelligence. But at this point, I really, really wanted to go into industry. Uh, so I applied to a lot of jobs in a lot of different companies. You know, figuring out, like, my background is in biometrics and machine learning. Things like that. Data science is not a word that had ever come to my mind that I was or could be, but I was more, like, interested in, like, you know, maybe software roles related to companies that did things that I had a similar background in.
Maria Puertos Calvo: For like a few months, I was looking in... I, I didn't even get calls. And I had no work experience other than, you know, I had been through college and grad school. So, I had... You know, and, and I was from Spain and from a Spanish university, and there was really nothing in my resume that was, like, oh, this is like the person we need to call. So, nobody called me. (laughs) And, and then one day, uh, I, I received a LinkedIn message from a Microsoft recruiter. And she says, "Hey, I have... I'm interested in talking to you about, uh, well, Microsoft." So I said, "Oh, my God. That sounds amazing." So, she calls me and we talk about it, and she's like, "Yeah, there's like this team at Microsoft that is like run mostly by data scientists and what they do is they help prevent fraud, abuse, and compromise for a lot of Microsoft online services."
Maria Puertos Calvo: So, they, they basically use data and machine learning to do things like stopping spam for Outlook.com, doing, like, family safety like finding, like, things on the web that, that should be, like, not for children. They were also doing, like, phishing detection on the browser. Um, like phishing URL detection on the browser and a co- compromise detection for Microsoft Account. And so I was like, "Sure, that sounds amazing." You know? "I would love to be in the process." And I was actually lying because I did not want to move to Seattle. (laughs) Like, at that time, I was so hopeful that I will find a job at, you know, somewhere in DC on the east coast, which is like closer to Spain and where, where we lived in. But at the same time, you know, Microsoft calls and you don't say no mostly when nobody else is calling you.
Maria Puertos Calvo: Um, so, so I said, "Sure, let's, you know, I, uh... The, the least I can do is, like, see how the interview goes." So, I did the phone screen and then I... They, they flew me to Seattle and I had seven interviews and a lunch inter- and a lunch kind of casual interview. So, it was like an eight hour interview. It was from 9:00 to 5:00. And, you know, everything sounded great, the role sounded great. Um, the, the team were... The things that they were doing sounded super interesting. And, to my surprise, the next day when I'm at the airport waiting for my flight to, to go back to DC, the recruiter calls me and says, "Hey, you, you know, you passed the interview and we're gonna make you an offer. You'll have an offer in the... In the mail tomorrow." I was like, "Oh, my God." (laughs) "What?" Like, I could not... This... It's crazy to me that this was, like, only seven years ago, it... But yeah.
Nic Fillingham: Oh, this is seven... So, this was 2014, 2013?
Maria Puertos Calvo: Uh, actually, when I did the interview, it was... It was more, more... It was longer. It was 2012.
Nic Fillingham: 2012. Got it.
Maria Puertos Calvo: And then I... And then starting my Microsoft in 2013.
Nic Fillingham: Got it.
Maria Puertos Calvo: I started as a... I think at that time, they called us analysts. But it was funny because the, the team was very proud on the, the fact that they were one of the first teams doing, like, real data science at Microsoft. But there were too many teams at Microsoft calling themselves, and basically only doing, like, analytics and dashboards and things like that. So, because of that, the team that I was in was really proud, and they didn't want to call themselves data scientists, so they... I don't know. We called ourselves, like, analysts PMs, and then we were from that to decision scientists, uh, which I never understood the, the name. (laughs) Uh, but yeah. So, that's how I started.
Nic Fillingham: Okay, so, so that first role was in... I heard you say Outlook.com. So, were you in the sort of consumer email pipeline team? Is that sort of where that, that sat?
Maria Puertos Calvo: Yeah. Yeah, so, uh, the team was actually called safety platform. It doesn't exist anymore, but it was a team that provided the abuse, fraud, and, and, like, malicious detections for other teams that were... At the time, it was called the Windows live division.
Nic Fillingham: Yes.
Maria Puertos Calvo: So, all the... All the teams that were part of that division, they were like the browser, right? Like, Internet Explorer, Hotmail, which was after named Outlook.com. And Microsoft Account, which is the consumer ecosystem, we're all part of that. And our team, basically, helped them with detections and machine learning for their, their abusers and fraudsters and, and, you know, hackers that, that could affect their customers. So, my first role was actually in the spam team, anti-spam team. I was on outbound, outbound spam detection. So, uh, we will build models to detect when users who send spam from Outlook.com accounts out so we could stop that mail basically.
Nic Fillingham: And I'd loved to know, like, the models that you were building and training and refining then to detect outbound spam, and then the kinds of sort of machine learning technology that you're, you're playing today. Is there any similarity? Or are they just worlds apart? I mean, we are talking seven years and, you know, seven years in technology may as well be, like, a century. But, you know, is there common threads, is there common learnings from back there, or is everything just changed?
Maria Puertos Calvo: Yes, both. Like, there, there are, obviously, common threads. You know, the world has evolved, but what really has evolved is the, the, the underlying infrastructure and tools available for people to deploy machine learning models. Like, back then, we... The production machine learning models that were running either in, like, authentication systems, either in off- you know, offline in the background after the fact, or, or even for the... For the mail. The Microsoft developers have to go and, like, code the actual... Let's say that you use, like, I don't know, logistic regression, which is a very typical, easy, uh, machine learning algorithm, right? They had to, like, code that. They had to, you know... There wasn't like a... Like, library that they could call that they would say, "Okay, apply logistic regression to, to this data with these parameters.
Maria Puertos Calvo: Back then, it was, like... People had to code their own machine learning algorithms from, like, the math that backs them, right? So, that was actually... Make things so much, you know, harder. They... There weren't, like, the tools to actually, like, do, like, data manipulation, visualization, modeling, tuning, the way that we have so many things today. So, that, you know, made things kind of hard. Nothing was... Nothing was, like, easy to use for the data scientists. It... There was a lot of work around, you know, how do you... Like, manual labor. It was like, "Okay, I'm gonna, like, run the model with these parameters, and then, like, you know, b- based on the results, you would change that and tweak it a little bit.
Maria Puertos Calvo: Today, you have programs that do that for you. And, and then show you all the results in, like, a super cool graph that tells you, uh, you know, like, this is the exact parameters you need to use for maximizing this one, uh, you know, output. Like, if you want to maximize accuracy or precision or recall. That, that is just, like, so much easier.
Nic Fillingham: That sounds really fascinating. So, Maria, you now... You now run a team. And I, I would love to sort of get your thoughts on what makes a great data scientist and, and what do you look for when you're hiring into, into your team or into sort of your, your broader organization under, uh, under identity. What perspectives and experience and skills are you trying to sort of add in and how do you find it?
Maria Puertos Calvo: Oh, what a great question. Uh, something that I'm actually... That's... The, the answer of that is something I'm refining every day. The, you know, the more, uh, experience I get and the more people I hire. I, I feel like it's always a learning process. It's like, what works and what doesn't. You know, I try to be open-minded and not try to hire everybody to be like me. So, that's... I'm trying to learn from all the people that I hire that are good. Like, what are their, you know... What's, like, special about them that I should try to look in other people that I hire. But I would say, like, some common threads, I think, it's like... Really good communication skills.
Maria Puertos Calvo: Like, o- obviously the basics of, you know, being... Having s- a strong background in statistical modeling and machine learning is key. Uh, but many people these days have that. The, the main knowledge is really important in our team because when you apply data science to cyber security, there are a lot of things that make the job really hard. One of them is the, the data is... What... It's called really imbalanced because there are mostly, most of the interactions with, with the system, most of the data represents good activities, and the bad activities are very few and hard to find. They're like maybe less than 1%. So, that makes it harder in general to, to, to get those detections.
Maria Puertos Calvo: And the other problem is that you're in an adversarial environment, which means, you know, you're not detecting, you know, a crosswalk in, in a road. Like, it's a typical problem of, of computer vision these days. A crosswalk's gonna be a crosswalk today or tomorrow, but if I detect an attacker in the data today and then we enforce... We do something to stop that attacker or to... Or to get them detected, then the next day they might do things differently because they're going to adapt to what you're doing. So, you need to build machine learning models or detections that are robust enough that use, use what we call features or, or that look at data that it's not going to be easy... Easily gameable.
Maria Puertos Calvo: And, and it's really easy to just say, "Oh, you know, there's an attack coming from, I don't know, like, pick a country, like, China. Let's just, like, make China more important in our algorithm." But, like, maybe tomorrow that same attacker just fakes IP addresses
Maria Puertos Calvo: Addresses in, in a bot that, that is not in China. It's in, I don't know, in Spain. So, so, you just have to, you know, really get deep into, like, what it means to do data science in our own domain and, and, and gain that knowledge. So, that knowledge, for me, is, is important but it's also something that, that you can gain in the job. But then things like the ability to adapt and, and then also the ability to communicate with all their stakeholders what the data's actually telling us. Because it's, you know... You, you need to be able to tell a story with the data. You need to be able to present the data in a way that other people can understand it, or present the results of your research in, in a way that other people can understand it and really, uh, kind of buy your ideas or, or what you wanna express. And I think that that is really important as well.
Nic Fillingham: I sort of wanted to touch on what role... Is there a place in data science for people that, that don't have a sort of traditional or an orthodox or a linear path into the field? Can you come from a different discipline? Can you come from sort of an informal education or background? Can you be self-taught? Can you come from a completely different industry? What, what sort of flexibility exists or should there exist for adding in sort of different perspectives and, and sort of diversity in, in this particular space of machine learning?
Maria Puertos Calvo: Yes. There are... Actually, because it's such a new discipline, when I started at Microsoft, none of us started our degrees or our careers thinking that we wanted to go into data science. And my team had people who had, you know, degrees in economics, degrees in psychology, degrees in engineering, and then they had arrived to data science through, through different ways. I think data science is really like a fancy way of saying statistics. It's like big data statistics, right? It's like how do we, uh, model a lot of data to, like, tell us to do predictions, or, or tell us like what, how the data is distributed, or, or how different data based on different data points looks more like it's this category or this other category. So, it's all really, like, from the field of statistics.
Maria Puertos Calvo: And statistics is used in any type of research, right? Like, when you... When people in medicine are doing studies or any other kind of social sciences are doing studies, they're using a lot of that, and, and they're more and more using, like, concepts that are really related to what we use in, in data science. So, in that sense, it's, it's really possible to come to a lot of different fields. Generally, the, the people who do really well as data scientists are people who have like a PhD and have then this type of, you know, researching i- but it doesn't really matter what field. I actually know that there, there are some companies out there that their job is to, like, get people that come out of PhD's programs, but they don't have like a... Like a very, you know, like you said, like a linear path to data science, and then, they kind of, like, do like a one year training thing to, like, make them data scientists, because they do have, like, the... All the background in terms of, like, the statistics and the knowledge of the algorithms and everything, but they... Maybe they're, they've been really academic and they're not... They don't maybe know programming or, or things that are more related to the tech or, or they're just don't know how to handle the data that is big.
Maria Puertos Calvo: So, they get them ready for... To work in the industry, but the dat- you know, I've met a lot of them in, in, in, in my career, uh, people who have gone through these kind of programs, and some of them are PhDs in physics or any other field. So, that's pretty common. In the self-taught role, it's also very possible. I think people who, uh, maybe started as, like, software engineers, for example, and then there's so much content out there that is even free if you really wanna learn data science and machine learning. You can, you know, go from anything from Coursera to YouTube, uh, things that are free, things that are paid, but that you can actually gain great knowledge from people who are the best in the world at teaching this stuff. So, definitely possible to do it that way as well.
Nic Fillingham: Awesome. Before we let you go, we talked about the perfect guacamole recipe last time because you had that in your Twitter profile.
Maria Puertos Calvo: Mm-hmm (affirmative). (laughs)
Nic Fillingham: Do you recall that? I'm not making this up, right? (laughs)
Maria Puertos Calvo: I do. No. (laughs)
Nic Fillingham: All right. So, w- so we had the perfect guacamole recipe. I wondered what was your perfect... I- is it like... I wanted to ask about tacos, like, what your thoughts were on tacos, but I, I don't wanna be rote. I don't wanna be, uh, too cliché. So, maybe is there another sort of food that you love that you would like to leave us with, your sort of perfect recipe?
Maria Puertos Calvo: (laughs) That's really funny. I, I actually had tacos for lunch today. That is, uh... Yeah. (laughs)
Nic Fillingham: You did? What... Tell me about it. What did you have?
Maria Puertos Calvo: I didn't make them, though. I, I went out to eat them. Uh-
Nic Fillingham: Were they awesome? Did you love them?
Maria Puertos Calvo: They were really good, yeah. So, I think it's-
Nic Fillingham: All right. Tell us about those tacos.
Maria Puertos Calvo: Tacos is one of my favorite foods. But I actually have a taco recipe that I make that it's... I find it really good and really easy. So, it's shrimp tacos.
Nic Fillingham: Okay. All right.
Maria Puertos Calvo: So, it's, it's super easy. You just, like, marinate your shrimp in, like, a mix of lime, Chipotle... You know those, like, Chipotle chilis that come in a can and with, like, adobo sauce?
Nic Fillingham: Yeah, the l- it's got like a little... It's like a half can. And in-
Maria Puertos Calvo: Yeah, and it's, like, really dark, the sauce, and-
Nic Fillingham: Really dark I think. And in my house, you open the can and you end up only using about a third of it and you go, "I'm gonna use this later," and then you put it in the fridge.
Maria Puertos Calvo: Yes, and it's like-
Nic Fillingham: And then it... And then you find it, like, six months later and it's evolved and it's semi-sentient. But I know exactly what you're talking about.
Maria Puertos Calvo: Exactly. So that... You, you put, like, some of those... That, like, very smokey sauce that comes in that can or, or you can chop up some of the chili in there as well. And then lime and honey. And that's it. You marinate your shrimp in that and then you just, like, cook them in a pan. And then you put that in a tortilla, you know, like corn preferably. But you can use, you know, flour if that's your choice. Uh, and then you make your taco with the... That shrimp, and then you put, like... You, you pickle some sliced red onions very lightly with some lime juice and some salt, maybe for like 10 minutes. You put that on... You know, on your shrimp, and then you can put some shredded cabbage and some avocado, and ready to go. Delicious shrimp tacos for a week night.
Nic Fillingham: Fascinating. I'm gonna try this recipe.
Maria Puertos Calvo: Okay.
Nic Fillingham: Sounds awesome.
Maria Puertos Calvo: Let me know.
Nic Fillingham: Maria, thank you again so much for your time. This has been fantastic having you back. The last question, I think it's super quick, are you hiring at the moment, and if so, where can folks go to learn about how they may end up potentially being on your team or, or being in your group somewhere?
Maria Puertos Calvo: Yes, I am actually. Our team is doubling in size. I am hiring data scientists in Atlanta and in Dublin right now. So, we're gonna be, you know, a very, uh, worldly team, uh, 'cause I'm based in Seattle. So, if you go to Microsoft jobs and search in hashtag identity jobs, I think, uh, all my jobs should be listed there. Um, looking for, you know, data scientists, as I said, to work on fraud and, and cyber security and it's a... It's a great team. Hopefully, yeah, if you're... If that's something you're into, please, apply.
Nic Fillingham: Awesome. We will put the link in the show notes. Thank you so much for your time. It's been a great conversation.
Maria Puertos Calvo: Always a pleasure, Nic. Thank you so much.
Natalia Godyla: Well, we had a great time unlocking insights into security, from research to Artificial Intelligence. Keep an eye out for our next episode.
Nic Fillingham: And don't forget to tweet us @msftsecurity or email us at firstname.lastname@example.org with topics you'd like to hear on a future episode. Until then, stay safe.
Natalia Godyla: Stay secure.