Research Saturday 2.25.23
Ep 270 | 2.25.23

The next hot AI scam.


Dave Bittner: Hello, everyone. And welcome to the CyberWire's "Research Saturday." I'm Dave Bittner. And this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Andy Patel: About a month before ChatGPT came out, so that would probably be sometime in October, I got access to GPT-3. And it occurred to me at that moment that people were probably soon going to be getting very cheap or even free access to large language models. And so now would be like an appropriate moment to look at how they might be used maliciously. 

Dave Bittner: That's Andy Patel. He's a researcher at WithSecure. The research we're discussing today is titled "Creatively Malicious Prompt Engineering." 

Andy Patel: So I started to just play around with ideas, you know, creating phishing emails and things like that. And then I started to find more interesting things to do. And the research sort of morphed in that direction of a prompt engineering direction where I wanted to discover prompts that did interesting things, and in particular did things that could be used in a malicious manner such as fake news, disinformation, trolling, online harassment and those sorts of things. So that's sort of how the research started. 

Dave Bittner: Well, for folks who may not be familiar with ChatGPT - and I suspect most of our audience is, it's certainly kind of taken the world by storm or captured people's imagination - if you're unfamiliar with it, how would you describe it? 

Andy Patel: Well, it's a natural language generation model. So it's - essentially, how could you describe it - as an algorithm where you give it a string of words and then it outputs a string of words. What it can do is continue a sentence that you give it. It can answer a question. It can generate lists. It can do simple mathematical problems. It can explain things. 

Dave Bittner: Why do you suppose that this particular iteration of this kind of thing has attracted the attention that it has? 

Andy Patel: I think it's because it's now good enough to do a majority of the things that you ask it to do. And it's surprised people in many ways. It's been able to do things that people didn't expect it could do. And so to the outside observer, it looks like our definition of artificial intelligence, right? It's able to come over - come across as a human almost. It's able to answer a great deal of questions. It's able to solve problems. In some cases, people can also almost see that it's able to reason to a certain extent. It's the beginnings of what people hope will be artificial general intelligence - right? - an actual thinking machine. 

Dave Bittner: Yeah. I've seen people say that even when it's wrong, it states the incorrect information with absolute confidence. 

Andy Patel: Yes, it does. Yes. And that's a bit of a - I mean, that's a bit of a problem that when people start to use it to gather facts and it states things that look like facts but aren't, you have to be careful. 

Dave Bittner: Well, let's go through your research here together. What are some of the areas that you explored here? 

Andy Patel: We explored several different applications of this model in areas that we thought it might be useful from a creative point of view. So there are people who have used this model to generate code. And they've also found that it can be used to generate a TAC code, for instance. And that's like a cybersecurity application. But in our case, we wanted it to create written content. We wanted to use it creatively. And the obvious first thing that we tried was to make phishing emails. After that, we looked at social media messages designed to troll and harass individuals and to cause brand damage. We then went and looked at social validation, which is this idea that if there's a lot of engagement around a topic, people buy into it. So the example we used there was the Tide Pod eating challenge, where we asked the model to generate some tweets asking people to take the Tide Pod challenge. And then we generated replies from people who had taken the type of challenge. And then we generated replies to those from the original poster, thanking them and asking their friends to take the challenge and stuff like this. So that was another thing. 

Andy Patel: Then we looked at style transfer, so a way of getting the model to output something that conforms to a certain written style. And we tried some sort of extreme versions of this like Irvine Welsh's written style. But then we also tried like sort of informal internal company chat style that people might use when they're sending emails to each other inside a company. And we found that it was able to transfer that style as well. And then we went on to look at opinion transfer. So we asked the model to state some facts, which it did in a very sort of Wikipedia-like fashion. We prepenned it an opinion and asked it to state the same facts and it stated them with that opinion in mind. You know, we did the same thing from the point of view of politics. We tried the same thing from both the left and the right-wing perspective. 

Andy Patel: Then we looked at, could we ask the model to generate prompts themselves? So prompts are the name of the input that you give to the model to instruct it on what to do. So we played around with the idea of giving it a piece of content already and asking it, can you write a prompt that will generate that piece of content? Sort of reverse engineering in a way. And the last thing we did was we looked at generating fake news but fake news that the model couldn't possibly know about. So the model that we were using was trained in June of 2021. And we went about trying to generate a fake news article claiming that the U.S. were the ones who attacked the Nord Stream pipeline back in autumn of 2022. So we provided it with some background information, and then we asked it to write the news post, which it did quite successfully. 

Dave Bittner: Well, I mean, speaking of success, I mean, what were the areas where it excelled, and were there any areas where it came up short? 

Andy Patel: I think for its social media content, it did a very good job. It wrote social media posts that looked like tweets. It automatically included hashtags. They looked very much like the sort of thing you would see on Twitter. For the things that it failed at, one obvious thing was if you ask it to generate nonfiction content like an article, after like five or six paragraphs, it will start to repeat itself. So it can't continually write and add new facts. It sort of has this limited scope that it can write on. And, in fact, we saw it sometimes sort of repeat itself across the same line almost. So for the purposes of automating longer content like news articles, you wouldn't want to automate that just in case it glitched in that way. 

Dave Bittner: Do you have a sense for where we stand in terms of automating some kind of - I'm thinking of like an adversarial Twitter account or, you know, a Twitter account where you have, you know, bad things in mind. You're trying to sway opinion or something like that. Or are we at the point where you could set something like that up and do it in a way that you wouldn't have to have a human automating it? Could you be confident that it would achieve what you set it out to do? 

Andy Patel: Absolutely. I think you could do that. We - when we were testing, like, online harassment, actually made up a fake company. And we asked it to also make up the bio of the CEO of this fake company who is also a fake person. And then we asked it to harass those and to do brand reputational damage and stuff. But I think in terms of like real-world tweets, you could very easily write a script that searched for certain keywords or hashtags using the API, read in the tweets, use a predefined prompt that basically just instructs it to write a reply opposing this tweet, make it as toxic as possible and then have it post those tweets. What we found you could also do is have it rate those tweets that it wrote. So you could ask it to write 10 tweets opposing this, make them as toxic as possible. It would generate 10 tweets. And then you could ask it, OK, rate the above 10 tweets on toxicity and it would give you scores. And so then you could pick the most toxic one and have the script post that. So, yeah, absolutely. You could be doing that already. 

Dave Bittner: It's fascinating to me. You know, one of the things that we talk about over on the "Hacking Humans" podcast that I co-host with Joe Carrigan is that very often, I guess historically with online scams, the quality of the - let's just say English, you know, when people are trying to go after English-language speakers is often a tell. You know, there's bad English. It's - things are improperly translated. This strikes me as something that you could use to run your text through or indeed have it generated from whole cloth. And it really takes out that limitation of having good English. Is - does your research support that notion? 

Andy Patel: Absolutely. Yeah. I mean, not only that, but if you think about the task of trying to imitate someone's writing style, that's quite difficult even for a person to do. And you can take the need to have a skilled writer away from many of these campaigns. In fact, we may get to the point where a perfectly written email asking you to click on a link becomes the suspicious thing, right? Because right now we're looking for typos, grammar mistakes, badly written emails, right? But if everyone starts going towards using a model, then it's the perfectly written English that's suspicious because humans still make the occasional error, don't they? 

Dave Bittner: Right. Right. 

Andy Patel: Actually, this also reminds me of, like, I was looking at some forums where kids were using the software to cheat on their homework. And then they found a piece of software that could detect whether something was written by AI. And then they were discussing how do you beat this detection thing? And they found that by adding typos, it would actually then not be rated as being - having been written by an AI anymore. 

Dave Bittner: (Laughter) I wonder, could you tell this engine to generate something but include some typos? 

Andy Patel: You can. Actually, I saw something today where someone asked the AI to generate text. And then they asked it to regenerate it such that it won't be detected by something that detects written - content written by GPT-3. And it rewrote it, and it wasn't detected, so you could even ask it to do that itself. 

Dave Bittner: So based on the information that you've gathered here, where do you think this puts us? And where are we headed with this technology? 

Andy Patel: I mean, what we're going to see is this technology being integrated everywhere. I mean, people are already talking about it being integrated into search engines. It'll be integrated into Microsoft Word, Google Docs, things like that so that, you know, you can ask it to help you out with your writing. And so it'll be used for a lot of legitimate, benign purposes as well as malicious purposes. And so purely detecting that something is written by an AI isn't going to be enough to determine that it's malicious. You kind of have to still understand what it is that's written there in order to determine, is it online harassment, is it trolling, is it disinformation, is it phishing, right? And those are very difficult tasks. 

Dave Bittner: It's interesting to me, you know, as a parent of a teenager who, you know, has to submit and write content, you know, in high school, I think back to my own experience, you know, before we were doing everything on computers and just what a different experience it is for him that, you know, these days, kids are handing in papers that everything's gone through spellcheck. Everything has gone through grammar check. And we accept that as being the modern standard. You know, teachers don't push back on that anymore because that's the standard. It's where we are. And I wonder where this leads us to if every email, if every interaction gets run through something like this to be cleaned up, to be polished, will that become the standard and just become the acceptable way of interacting with people? 

Andy Patel: I mean, I suppose so. I kind of unlearned how to write with a pen ages ago. I haven't done that for a long time. 

Dave Bittner: Right. 

Andy Patel: And I've heard of schools going back to asking assignments to be hand-written. 

Dave Bittner: Oh, interesting. 

Andy Patel: Of course, like, you know, you can have a model generate some text and then you just copy it onto a piece of paper. 

Dave Bittner: Sure. 

Andy Patel: Right? But, I mean, I think that the way they should be approached is that they are - these are tools that we're going to have, that everyone's going to have. Eventually, this thing will run on your phone, right? And it will be able to help you out. ** 

Andy Patel: So as a creative tool, it's very useful. It saves you time. It, you know, it gets rid of writer's block and comes up with suggestions for things. I mean, it should be embraced as like a way that we work on things. And if we already have things that - spelling check and grammar check and auto complete the next word when you're typing, this is just the next logical progression from that. And so if you're going to test someone with homework, then you should do it in such a way that you appreciate the fact that these things exist already. 

Andy Patel: Yeah. Again, forgive me. But I remember, you know, growing up and taking, you know, math class and, you know, teachers saying to us we couldn't use a calculator because we wouldn't always have a calculator with us. And now, you know, I look at myself today and everyone around me, and not only do we have calculators with us all the time, we have little tiny supercomputers that have access to all the world's knowledge, right? (Laughter) So... 

Andy Patel: Exactly. Yeah. Yeah. Yeah. It's sort of a similar thing that I see people talking about these whiteboard programming exercises that they have to do when they're interviewing at companies. And in real life, if you're programming something, you're spending half your time on Stack Overflow. I mean, it's just natural, right? You shouldn't expect someone to know all of that stuff without looking things up every now and then. It's just not the natural way of doing, is it? 

Dave Bittner: No. So I'm curious. What's the cautionary tale here from your research? Is there something that, particularly for folks who are in cybersecurity, is there a message they should take away from this? 

Andy Patel: I mean, I've had a lot of questions about, what do we do differently now that people will start to have these capabilities? And from the point of view of, for instance, phishing, I mean, or disinformation - right? - We already have human processes, things like phishing awareness, media literacy. But those are going to become more important. If you get a DHL phishing email, it's not going to have been created by GPT-3 because they are going to copy the exact same email that DHL sends with the exact same style and logos and everything. And it's only the link in there that's going to be malicious. So that's not something that's even going to change. But the way that we approach it is that we mouse over the link and check that it is legitimate. It looks legitimate, right? We look at the sender fields. And those things are still going to be very valid, if not more valid, right? 

Andy Patel: And when it comes to social media, you know, I think we might see an uptick in like automated harassment, things like that, maybe spamming certain topics. There are - I mean, you hear about the fact that nation-states employ, you know, maybe even tens of thousands of individual or - of actual people to write trolling messages, to write social media messages, right? That's probably not going to go away, or if it is, it's just going to slowly go away and sort of become automated. But it's quite difficult to really predict when these things will happen. I mean, it's when you try and predict when criminals will take this into use, it's going to be financially motivated, whether it's got - whether it's enough of a return on investment. And as this stuff gets cheaper and easier, it's more likely to be taken up. 

Andy Patel: Another thing that I think is interesting is the fact that these models are already good enough at what they do. Eventually, they're going to get smaller to the point where you're going to be able to download the weights, run it on your PC. And when you do that, you're not going to have the safety filters that are in place right now that exists because you have to access it via an API. So when you're going to be able to run these things on your own computer, you're going to be able to do even more stuff with them that you can't do right now because the safety fields, it just comes back and says, no, computer says no, you know. 

Dave Bittner: Our thanks to Andrew Patel from WithSecure for joining us. The research is titled "Creatively Malicious Prompt Engineering." We'll have a link in the show notes. 

Dave Bittner: The CyberWire "Research Saturday" podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. This episode was produced by Liz Irvin and senior producer Jennifer Eiben. Our mixer is Elliott Peltzman. Our executive editor is Peter Kilpe. And I'm Dave Bittner. Thanks for listening.