Research Saturday 5.12.18
Ep 35 | 5.12.18

Three pillars of Artificial Intelligence — Research Saturday

Transcript

Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday, presented by the Hewlett Foundation's Cyber Initiative. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Dave Bittner: [00:00:26] And now, a moment to tell you about our sponsor, the Hewlett Foundation's Cyber Initiative. While government and industry focus on the latest cyber threats, we still need more institutions and individuals who take a longer view. They're the people who are helping to create the norms and policies that will keep us all safe in cyberspace. The Cyber Initiative supports a cyber policy field that offers thoughtful solutions to complex challenges for the benefit of societies around the world. Learn more at hewlett.org/cyber.

Dave Bittner: [00:01:02] And thanks also to our sponsor, Enveil, whose revolutionary ZeroReveal solution closes the last gap in data security: protecting data in use. It's the industry's first and only scalable commercial solution enabling data to remain encrypted throughout the entire processing lifecycle. Imagine being able to analyze, search, and perform calculations on sensitive data, all without ever decrypting anything. All without the risks of theft or inadvertent exposure. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.

Bobby Filar: [00:01:42] I think the bulk of the researchers were brought together after a conference that occurred December 2016.

Dave Bittner: [00:01:50] That's Bobby Filar. He's a Principal Data Scientist at Endgame. The research we're discussing today is titled "The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation."

Bobby Filar: [00:02:01] There was kind of a formal panel. Some of the conversation was laid out about the needs or requirements to establish a little bit more rigor within the AI research community, to establish norms, kind of garner some sort of aspects of safety and ethics. Over the next six months, I believe they they kind of laid the groundwork, coming together with the three pillars, kind of the political, digital, and physical security aspects.

Bobby Filar: [00:02:29] Myself and Hyrum Anderson, another data scientist here at Endgame, were brought in to specifically contribute to the cybersecurity, or information security, component. And then, it was a lot of, hey, let's try to get together and meet virtually, twenty-two opinions, and try to figure out what the common themes are going to be, what we want this to read like.

Bobby Filar: [00:02:54] We didn't want it to be kind of a claxon, or call to arms, about any sort of robot apocalypse. We wanted it to be very pragmatic, a very thoughtful approach that was more policy-driven than kind of the pop culture, mainstream media was reporting on. The Oxford and Cambridge researchers, kind of the principal researchers, once everything was written up and we felt comfortable with it, took it and kind of brought us across the finish line with editing. There was a ton of help from OpenAI, as far as getting some some of the news out there and PR.

Dave Bittner: [00:03:32] Yeah, I think it's a remarkably accessible report. You know, my hat's off to you. Let's start though, I think maybe the most fundamental question here is, at the outset, defining artificial intelligence and machine learning. I think particularly in the cybersecurity world, and when it comes to marketing in particular, I think those terms have gotten a little bit fuzzy. So, can you help, for the purposes of this paper, how did you approach those definitions?

Bobby Filar: [00:04:03] I mean, first, there is no way marketing has ever overstated what AI is. That seems implausible to me.

Dave Bittner: [00:04:09] (laughs)

Bobby Filar: [00:04:09] I think there's a pretty big misunderstanding, and it's a by-product of Hollywood, our favorite TV shows, and things like that, about what AI really is. For me, growing up, it was, you know, things like Robocop, which was super exciting, and then later I, Robot, and all this fun stuff.

Dave Bittner: [00:04:30] Lieutenant Commander Data.

Bobby Filar: [00:04:32] Exactly. So there were all these references in pop culture about what AI was supposed to be, when in reality, it can be much more mundane. I think we've seen recently there's been some more interesting aspects. But its core, it's really just kind of statistics and machine learning, and for the folks listening that don't really understand machine learning, there are three main pillars. This concept of supervised learning, where you have examples of something with some sort of label, and then you train a model to recognize it. A good example is like spam detection.

Bobby Filar: [00:05:13] Unsupervised learning is where you have a bunch of disparate data, and there are no labels, and you attempt to cluster them together based on commonalities, in hopes to deriving some sort of information. And a lot of times that's used to derive labels. So you could look at things like economic factors, location, schools, and all of this and attempt to group together and categorize people or districts into, you know, red versus blue, rich versus poor, things like that.

Bobby Filar: [00:05:46] And then the final one, and one that gets referenced I think quite a bit in the report, is this idea of reinforcement learning, where you're treating whatever problem you're trying to solve like a game, and you're allowing an algorithm to try to figure out the best way to solve a problem on its own. And then based off of some sort of reward function feedback loop, it takes that information, adjusts its parameters accordingly, and then tries again. And it does this, instead of, you know, five or six times like it would take us to learn how to shoot a basketball or, you know, swing a golf club, it does it millions of times in a very small setting as a way to perfect a particular approach.

Dave Bittner: [00:06:31] So, let's go through, you all lay out a general framework for AI and security threats. Can you take us through what you discussed there?

Bobby Filar: [00:06:39] Yeah, so at its core, we try to focus on kind of three pillars: the political spectrum, the physical spectrum, and then the cyber spectrum. I think, for most people, the things that you would hear about, or are hearing about, are on the political side, particularly right now with things like Cambridge Analytica.

Bobby Filar: [00:07:03] There was a more, I think, humorous one that occurred with Jordan Peele from the comedy group Key and Peele, where he made an Obama lip-syncing video, where he basically used AI, and some of these algorithms that are readily available now to people who are non-practitioners, who are, you know, non-PhDs, and basically created a video that allowed him to put forth a script that it seemed like Obama was reading from, that said the AI apocalypse was coming and all that fun stuff. So you're starting to see more and more of that.

Bobby Filar: [00:07:44] Another one that kind of came out, that I think will have interesting ramifications down the road is this idea of Deep Fakes. So, the ability to more or less morph any picture and overlay it with a picture of anybody you know, or anybody you're interested in seeing, and there's a lot of potential safety and security concerns there, where basically if you make somebody angry, all of a sudden your face could be on an inappropriate picture. There's blackmail concerns and things like that.

Bobby Filar: [00:08:18] And I think what you're starting to see is that sort of accessibility to AI, and the lower cost of entry to using it, is going to lead to both really good positive breakthroughs, and with that the exploitation of those for more nefarious means.

Dave Bittner: [00:08:36] Well, let's explore that specifically, I mean, the cost issue. People that I've interviewed over the past year or so, you know, they've said a lot of times the bad guys, when it comes to cyber attacks, they have shied away from AI because it's been expensive and there are cheaper things that work. You know, just mass spamming, or phishing campaigns, or things like that. So, one of the things that this paper points out is that the cost of these tools is decreasing and the availability of these tools is increasing.

Bobby Filar: [00:09:05] Yeah, yeah. I think for the most part the researchers you've spoken to are absolutely correct. And that's certainly one thing that we try to emphasize in this report, that there have been no outright uses within cybersecurity of AI being used. Researchers within the InfoSec community have come up with a variety of use cases, and it's certainly plausible. I think we're still at that stage where, like you said, it's just a little bit easier right now to take the low-hanging fruit approach because it is still effective.

Bobby Filar: [00:09:41] But I think what you'll see is the advent or use of machine learning and AI in defensive technologies, within cyber security, will lead to a little bit more generalized approach to tackling the threat landscape, and, as those little pockets are shored up, it will require the attacker to become a little bit more sophisticated. And I think they'll look to the very tools that we're employing as an opportunity to attack. So, it's kind of a cat and mouse game, if you will, or, in its truest form, a red versus blue between AI algorithms and the people they are meant to stop, where they will use those exact algorithms against us.

Bobby Filar: [00:10:27] So I think, as far as the cost concern or resource concern, is being considered, you have platforms right now, and there are dozens of them, that make programming AI models and algorithms as easy as, like, a dozen lines of code in things like Python. You don't necessarily need a math PhD. If you want to become more familiar about the underlying concepts, the openness of the AI community, and particularly the educational aspects, things like massively open online courses, have made the concepts that much easier to understand as well.

Bobby Filar: [00:11:08] So, between those two, and then just the overall availability of models that are pre-trained, you don't really need to know anything about, which is like the Jordan Peele case, where he just grabbed a lip-sync model from the Internet and then used that to his own ends. You're talking about coming up with an idea, getting a super advanced piece of technology that didn't exist five years ago, and then turning it into a piece of political propaganda, you know, and you're done before lunch. Like, that to me is utterly fascinating. That this sort of thing can transpire so quickly and so easily, that at its core is one of the things that we're trying to get across in the report.

Bobby Filar: [00:11:53] I think a good case study that was mentioned was one in the InfoSec section about a company and two researchers from ZeroFOX. They do social media analysis and things like that, actually up in Baltimore where you're at. A good group of guys. They had this idea that, hey, what if we started reading people's tweets, and then we used a generative model, so something, you see things like AI that can generate Harry Potter stories. It's kind of the same concept. You train it on a subset of your tweets, and then you feed it a little seed, like a topic, and then it produces 140 characters that seem semi-realistic of something you'd be interested in, or something that you may have even tweeted at one time.

Bobby Filar: [00:12:42] So they did that, and then they slapped a URL with a, I think a Google shortener application, and then fed it to a bunch of their friends to see who would click on it. It was amazing, the effectiveness. I mean, you're talking going from, like, five to ten percent effectiveness to sixty to seventy percent effectiveness, all with, you know, applying like a little bit of data collection and, you know, a few hours of Python programming. And, I think, when it comes down to, you know, those numbers, that cost is low enough where attackers will start considering that as a potential means to an end.

Dave Bittner: [00:13:20] So, being able to put something in a familiar voice by training, I guess, the stylistic specifics of that voice?

Bobby Filar: [00:13:29] Yeah, it's, you know, spearphishing is obviously super successful, but that requires a lot of manual work. They're talking potentially several hours of OSINT work, fully understanding, you know, maybe the look or feel of a particular password reset email, or a Capital One credit card statement, or anything like that.

Dave Bittner: [00:13:53] Right.

Bobby Filar: [00:13:55] Suddenly, if you have access to data which so many of us put--Facebook, Twitter, Instagram, things like that--just out there, readily available. Not only our own, but we also show our preferences, based off of who we follow. We become very susceptible, at that point, to, you know, you see this with Twitter with, like, promoted tweets. As I set out to RSA last week, every single promoted tweet I got was from a vendor with "#RSAC."

Dave Bittner: [00:14:25] Right.

Bobby Filar: [00:14:26] And it's like, I am moderately interested in this because I am in the area, I will most likely walk by this vendor, and it is interesting to see whatever marketing language they're using. Now, imagine that with a non-information security professional, and it's about their favorite basketball team, their knitting club that they're associated with. Something super-specific, but it really required the attacker not to know anything like that, just to download your tweets and then pass it through an algorithm. I think that's super interesting from a research standpoint, and it's kind of scary from just an everyday layman standpoint.

Dave Bittner: [00:15:05] Yeah. And one of the things that the research points out is the ability, the increasing ability, of these systems to create synthetic images basically from scratch. And it tracks sort of the system's ability to create realistic looking human faces, and we're at the point now where a synthetic image looks like a photorealistic image of a face. And it strikes me that, from a political point of view, from a societal point of view, that, if we hit the point where photographic evidence is no longer photographic evidence, and, as you say, the generation of these can be done automatically in a matter of minutes, rather than someone having to spend a lot of time in Photoshop, or cutting and pasting, and so forth, well, that kind of changes the game, doesn't it?

Bobby Filar: [00:15:54] Yeah, yeah, it really does. And, you know, towards the end of our report, our call to action is, you know, it's one part vigilance, one part openness and taking responsibility. But a huge kind of third component is education. And that education comes in the form of education towards the overall population, but specifically policymakers, so that they're aware of kind of the various side effects of this sort of dual-use, dual-nature technology.

Bobby Filar: [00:16:27] And I think, if you or any of the listeners were paying attention during the Zuckerberg questioning on the Hill a week or two ago, you can start to see that, like, you know, these congressmen and congresswomen are starting to have to tackle these problems that are, by and large, very, very technical and very sophisticated.

Bobby Filar: [00:16:49] And it's one thing when it's data collection, which is kind of a very hard concept and relatively straightforward to understand. Data collection unbeknownst to the user is a bad thing. It's a whole nother when it's, like, political propaganda being created and distributed through sites like Twitter and Facebook and Instagram, that are indistinguishable from, you know, everyday reality. I think that's a more terrifying sort of a process, and I think that is something that policymakers are going to be made aware of in the near term.

Dave Bittner: [00:17:23] To that point about, you know, the Facebook grilling from the members of Congress, you know, it strikes me that, I guess the argument could be made that the fact that the policy-making is a slow, deliberative process, you can make the argument that for a long time, you know, that's a feature, not a bug. But, as the rate of change increases, the velocity increases, when it comes to the developments in things like AI, I wonder, is policy always going to lag, and do we have to make sort of fundamental changes to be able to keep up?

Bobby Filar: [00:18:02] Yeah, that's an interesting question. And certainly one that was kicked around quite a bit in the chat rooms, in Google Docs, that the researchers of this report were talking about. What we, as researchers, need to do, and need to be more open to, is--and this isn't to say that we're not doing an okay job right now, it's just, like anything, we can do better--and that's just being more open with a lot of the research that we're doing. Red-teaming it, there's a huge component of that, I think, in the report, both anecdotes, stories, as well as the recommendation to do this.

Bobby Filar: [00:18:46] And a good example is kind of Next Generation Antivirus, which is something that I'm sure you're familiar with. It's a big marketing term, obviously, but, you know, these are platforms that are meant to eliminate the need for signature-based AV, with the expectation that you can get out ahead of threats, which is fantastic. It's proven to be very successful. It's an arduous process that requires massive amounts of samples correctly labeled.

Bobby Filar: [00:19:18] But, at the end of the day, it's still a byproduct of the data it sees. So, even if it generalizes very well, and can pick up on little nuances here and there, it's still very prone to attack. And, with things like VirusTotal, where you can kind of submit a sample and then see a broad spectrum of vendors and how they respond to the sample you submit.

Bobby Filar: [00:19:40] There's been research, and myself and Hyrum Anderson teamed up with UVA to do this very thing, which is, could you use AI kind of against AI, is this possible? And we set up kind of this game that can be thought of as like, you know, when you were in college and you tried to sneak into a bar, you showed up the first time and you wore a hat. And you're like, this hat makes me look older. You tried it, and the bouncer was like, no, no, no, that's that's not going to work. And then you're like, you know what, I bet if I grew out my beard a little bit that would help.

Dave Bittner: [00:20:18] Right.

Bobby Filar: [00:20:18] Go back again and it's like, eh, better, but no. And then, finally, you're like, well, maybe I just need to get a fake ID. And then, once you get the fake ID, you're in and you're like, oh, well that was the solution all along, that was the hole in the process.

Bobby Filar: [00:20:33] So, it's kind of the same idea with this attacking Next-Gen AV with kind of this reinforced game with malware, where you take a piece of malware and you throw it at Next-Gen AV, it spits back good or bad. Some Next-Gen AVs are a little bit more helpful, and they spit back a number of, like, a confidence score or a probability of maliciousness.

Bobby Filar: [00:20:58] And this helps even more, because you can start to understand the ebb and flow of you making a decision, or altering one piece of, or part of code, and the effect that that has on the score. So if you can do that enough you can start to learn, like, oh, well if I pack the binary and then scrub strings and do this or change it to a Russian language pack. Then all of a sudden I can get past. And it's, all of a sudden you learn a recipe for bypass.

Bobby Filar: [00:21:26] And this is a complicated process, certainly. It's one that requires a little bit of overhead, a little bit of resources. But at the end of the day, if you're an attacker, and you have one shot to get it right, what better approach is there than to to have access to all of this information offline, craft this perfect piece of malware using artificial intelligence, and then suddenly you're through?

Bobby Filar: [00:21:52] And those are the sorts of concerns, particularly from like a Next-Gen AV standpoint, that not only researchers need to be made aware of but, you know, consumers, and politicians, and things like that. And I think that is a very closed example within InfoSec that could be propagated across the physical and political security spectrum as well. There's certainly this aspect of red-teaming that needs to occur, and then reporting back to policymakers, so we perform some sort of due diligence on that end.

Dave Bittner: [00:22:29] Yeah, and I think, you know, in your example, the other thing that happens is, you know, word gets around that the fake ID is the way into that bar, so you have fewer people trying hats and growing mustaches.

Bobby Filar: [00:22:42] Right. Right. As somebody who works on this problem, again, a lot of this stuff is very, very fascinating, because it's all conceptual right now, but it's very easy to look down the road, you know, twelve to twenty-four months from now and start to understand, like, yeah, there's a process that could occur, and depending on whether or not attackers start to believe that the juice is worth the squeeze, we may start seeing that.

Bobby Filar: [00:23:11] So, the onus is on us to kind of eat our own dog food, and just like red-teaming any other security tool, take the results from that and empower the product itself. We do a lot of things where we propose adversarial training where you're generating, like, all of these instances of kind of morphed malware, and then saying, like, well, just because this doesn't look like the malware it once was, that doesn't mean that it's any less bad. So now let's train on that so it's at least seen this sort of change in behavior, so we catch it the next time. And it's all about, yeah, staying ahead, making sure model drift doesn't occur, that the models don't become stale, just trying to stay as current and up to date as possible.

Dave Bittner: [00:23:59] Now, looking ahead, you know, looking down the road, what were some of the conclusions from the group? Does it seem like appropriate attention is being paid to this? Is there hope? Is it gloom and doom? Where did you all land with that?

Bobby Filar: [00:24:15] It's a little bit of both. And to be honest, this was something where researchers were split in a lot of ways about, you know, is it all bad, is everything fine? I think, for the most part, AI is always going to get kind of a negative connotation. And there's plenty to blame from that, and maybe Hollywood is.

Dave Bittner: [00:24:36] We've all seen the Terminator.

Bobby Filar: [00:24:38] Exactly. But at the end of the day, AI is being used everyday for things to make our lives easier as well. Self-driving cars is obviously one where there have been unfortunate accidents that have occurred. And clearly we're not there yet, but once we are, you're talking about a fantastic opportunity to help, you know, increase the efficiency of our highways and commutes and things like that. There's also things like AI being used to identify medical problems through x-rays, using computer vision. These are all very, very good things that are occurring, and very few people can say that that isn't the case.

Bobby Filar: [00:25:19] But that being said, any technology like this can be abused. It can be morphed and kind of twisted to some sort of, as I said earlier, kind of nefarious end. And those are the things that we need to be more considerate about. And I think--and the report goes into this as well--out of all those spaces, of physical, political, and information security, we should be looking at the information security first as kind of a standard on how to handle this, because the information security community has been having to deal with these technologies being morphed and twisted to nefarious ends for a long time.

Bobby Filar: [00:26:02] And, yeah, we don't have a down to a science on how to handle it best, but we have attempted to do things like disclosure, best practices, things like that, red-teaming, have all become kind of mainstays. And, yeah, it's not perfect, but it at least provides a roadmap of what could and should be done, particularly within this kind of new newfangled space of AI.

Dave Bittner: [00:26:28] It's an opportunity for us to lead the way.

Bobby Filar: [00:26:31] Right. And I think a lot of the researchers that we worked with were political and physical security researchers. And they were kind of the first ones to come to this and say, like, well, you guys have things like vulnerability disclosure, and I'm like, yeah, yeah, we do, and, you know, it works, and it's reasonably effective, and we have things like bug bounties, and we try to open up some of our tools to the community.

Bobby Filar: [00:26:58] We could obviously do better. And I think things like explainability and interpretability of what these AI models are doing in each of these fields. Information security is certainly one. Like, why did you call this binary bad? I think we need to be personally accountable them from an ethics standpoint and a safety standpoint, so people start to understand why things are occurring and why they're not. And I think that's something that you'll see researchers, and security vendors in particular, take more seriously in the next twelve to eighteen months.

Bobby Filar: [00:27:34] I mean, I think one thing, and this would be more of a shameless plug, but some of your viewers will likely be attending Black Hat and Defcon this year. You know, if you're interested in kind of the AI aspects, of how it's being employed or deployed in information security, go to those conferences and ask around. Talk to vendors. If you're at a vendor booth try to grab a technical person. I know we at Endgame try to supply researchers or data scientists.

Bobby Filar: [00:28:02] Defcon this year, I'm part of a committee that's standing up a village specifically for artificial intelligence, just to educate, provide examples, and demonstrations on how AI can be used for both good and bad. So whether you're a practitioner, a decision maker, or just a casual observer, you can walk away with at least some understanding from the people who use it day in and day out, on kind of the effect that it could have within your life.

Dave Bittner: [00:28:31] Let me ask you this. Years ago, Carl Sagan, the famous scientist, he had what he called his Baloney Detection Kit, which was, you know, a way to detect if someone was trying to fool you, trying to pull one over on you. Do you have any recommendations along those lines for folks who are trying to cut through the marketing noise when it comes to this stuff? Any guidance for, if you really want to learn about this but you want to not be fooled by, you know, the marketing, what's a good approach to that?

Bobby Filar: [00:29:05] That's actually a great question, and one that probably doesn't get asked enough. But yeah, I would never recommend walking in blind faith, and I would imagine that most people listening to this would take that same approach.

Bobby Filar: [00:29:19] My advice would be, for any vendor that claims machine learning, AI techniques, it really starts with data. Try to get a better idea of where their data is coming from. If it's a closed source and they can't talk about it, get them to talk about the number of samples that they have, or the diversity of that data. Because bias can creep up very, very quickly in these situations. If they only have malware, and that malware is specifically Russian and Chinese, then the first time somebody at your company downloads a piece of software, well, a Russian or Chinese language pack that's completely benign, it will likely get flagged. That's just the nature of bias, and that bias exists, you know, within ourselves, and it exists within the realm of models and machine learning. So data is a big thing.

Bobby Filar: [00:30:11] Another big thing is trying to understand how they're training that data, you know, kind of the models they're using, and then how often. Just because of how rapid and dynamic the information security space is, particularly from an attacker perspective, that shift in speed of attacks and discrepancy and attacks, can lead to models that aren't trained very often, becoming very stale, leading to bypass and things like that.

Bobby Filar: [00:30:40] So, I would say trying to determine whether or not the machine learning pipeline is mature, in the sense that it's trained consistently on fresh data, they're accounting for things like old data sloughing off and not being useful anymore. Those two things are are very, very important and could at least provide some sort of background to you in feeling a little bit more confident in whether or not you believe kind of the spiel that you're being pitched.

Dave Bittner: [00:31:12] Our thanks to Bobby Filar from Endgame for joining us. The research we discussed today is titled "The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation." We've got a link to the research paper in the show notes of this episode.

Dave Bittner: [00:31:27] Thanks to the Hewlett Foundation's Cyber Initiative for sponsoring our show. You can learn more about them at hewlett.org/cyber.

Dave Bittner: [00:31:35] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security at enveil.com.

Dave Bittner: [00:31:43] The CyberWire Research Saturday is proudly produced in Maryland, out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technology. It's produced by Pratt Street Media. The coordinating producer is Jennifer Eiben, editor is John Petrik, technical editor is Chris Russell, executive editor is Peter Kilpe, and I'm Dave Bittner. Thanks for listening.