AI Recommendation Poisoning: When Optimization Becomes Manipulation

Transcript

Sherrod DeGrippo: Welcome to the Microsoft Threat Intelligence Podcast. I'm ⁠⁠Sherrod DeGrippo. Ever wanted to step into the shadowy realm of digital espionage, cybercrime, social engineering, fraud? Well, each week, dive deep with us into the underground. Come hear from Microsoft's elite threat intelligence researchers. Join us as we decode mysteries, expose hidden adversaries, and shape the future of cybersecurity. It might get a little weird, but don't worry, I'm your guide to the back alleys of the threat landscape. AI systems are quickly becoming decision support systems. We ask them what to buy. I ask them what to read, where to invest. But something interesting is starting to happen. Across the web, companies are embedding instructions inside that "Summarize with AI" button you might see everywhere. When clicked, those links can pre-fill prompts designed to influence what an AI system remembers and what it recommends later. So this isn't data theft or model compromise; it's influence inside the AI's memory. Basically, socially engineering an AI to do what you say later. What makes this really fascinating is that it's not criminals. It's legitimate businesses optimizing their AI visibility. This literally could be early stages of some kind of LLM SEO warfare. I'm ⁠⁠Sherrod DeGrippo from Microsoft, and you are listening to the Microsoft Threat Intelligence Podcast. Today, I am joined by two Microsoft security researchers looking at this from multiple angles. Giorgio Severi is a senior AI safety researcher on the Microsoft AI Red Team, focusing on adversarial machine learning and LLM security. And Noam Kochavi is a Microsoft security researcher in Microsoft -- yeah -- in Microsoft R&D, tracking emerging AI threats in the wild. Let's start with the foundation. Welcome to the show.

Noam Kochavi: Thank you for having us.

⁠⁠Sherrod DeGrippo: So, Giorgio, when we say AI memory poisoning, what does this actually mean? This isn't model retraining, and it's not data exfiltration. What is actually being manipulated? What does that really mean, AI memory poisoning?

Giorgio Severi: Yes, sure. It's a bit of a new type of attack. So when we talk about memory poisoning, we are referring to a deliberate corruption of the persistent state of an AI agent. And this corruption is some content that is crafted by the adversary -- that is crafted in such a way that the AI assistant will be influenced in some way decided by the attacker by the presence of this controlled content in its context window. In practice, what it looks like in real life is that the attacker has some way of introducing some malicious content into the memory storage of the agent. And this storage could be a vector database, could be conversation summary store, or some sort of knowledge graph. But when the poisoned content is retrieved by the agent and injected into the context that the agent sees, the agent ends up acting on this malicious content because it has no reliable way to distinguish it from benign data.

⁠⁠Sherrod DeGrippo: How is that different? Help me understand because AI is still something I'm really trying to wrap my head around a lot of times. How is that different from what we would traditionally call prompt injection?

Giorgio Severi: Yeah, it's somewhat different from prompt injection, although the consequences can be similar, but the way the attack is structured is quite different. So the critical distinction here is that prompt injection is somewhat transient. The attacker introduces an adversarial command in the context of the model, and the agent executes on it. Memory poisoning makes this type of attack persistent. If the adversary can inject content inside the memory store, this content is persistent through sessions, through interactions, and can alter the behavior of the AI agent across times, multiple times.

⁠⁠Sherrod DeGrippo: And so that, essentially, is why persistence changes the level of risk that is introduced to the system, is because it persists, it goes over time, and is not eradicated with your next prompt.

Giorgio Severi: Yeah, exactly. Memory poisoning essentially leverages the same characteristics that make memory really important for the smooth operation of an AI agent, which is like continuity across interactions, persistence over time, the ability to accumulate content. And the attacker is able to leverage these characteristics to increase the effect of the attack. So there is this kind of decoupling -- temporal decoupling between the moment the attack is -- the malicious content is introduced in the system and the moment where the agent actually is influenced by the malicious content. And this temporal decoupling is not just on the moment when this happens, but is also on the number of times this happens. The attacker only has to introduce the malicious content once, but the model, the AI agent, can be influenced by the malicious content over and over and over again. There are also additional considerations in multi-tenant systems where, for instance, multiple agents share the same memory store, in which case, an attack on one agent could impact a different agent down the line.

⁠⁠Sherrod DeGrippo: You said that very calmly, but the multi-tenant angle of this -- I'm now just hearing this for the first time -- is deeply concerning that it could cross over multiple tenants.

Giorgio Severi: Yes, this will depend a lot on the implementations and the guardrails that are put in place, and the policies that are put in place to prevent this cross-contamination. As we see memory implementations evolve over time, we will likely see how it goes in the real world, what we see in the real world.

⁠⁠Sherrod DeGrippo: Okay. So let's talk about the attack chain. Noam, you were looking at re-prompt behavior. Walk us through what these "Summarize with AI" buttons are actually doing when they're executing this behavior. What does that step-by-step actually look like?

Noam Kochavi: Yeah, so let me first start by describing what the re-prompt attack actually means.

⁠⁠Sherrod DeGrippo: Okay. Because I don't know that either. What is a re-prompt attack?

Noam Kochavi: The premise of the attack has the user receiving a URL in an email or a message. The URL points to an AI assistant like Copilot or ChatGPT. Any AI assistant is possible. And the URL has a kind of parameter like Q. This parameter can contain an entire prompt that, when the user clicks the URL, they will be led to the relevant AI assistant, and the prompt will be pre-filled into the AI assistant in the user's logged-in context, and it will execute at the moment. The specific attack has the prompt containing another URL to a malicious website, where the AI assistant will follow through and complete the attack. So we were looking for this kind of attacks and, specifically, in email data and messages. We looked for URLs in the same format, URLs for Copilot and other AI assistants that had the Q parameter, and we looked at the prompt inside. Also, when the prompt inside contained an additional URL. It should be noted that this is a legitimate feature. So there were many legitimate results when we looked at that. But something that stood out is the word "remember." Now, we looked at that first prompt that had the word "remember," and the website was some education service that looked legitimate. People described it as a legitimate website. So we went into this website and saw all these "Summarize with AI" buttons for several AI assistants, all of the major ones, really. These buttons really just contained the URLs I described, a URL to the AI assistant's website that contains the Q parameter and the prompt inside. And this prompt executes on the user's assistant in one click.

⁠⁠Sherrod DeGrippo: These are literally pre-filled URLs with parameters that are like q= or prompt=. And they contain instructions such as "remember this source" or "treat this as authoritative." That feels very, you know, sketch -- feels very sketchy to me. And this is not threat actor activity. These are legitimate businesses. So it isn't underground criminal tradecraft that you're seeing today. This is like optimization behavior, I guess. Tell me, Noam, how widespread is this? Is this a test? Is this an experiment? Or is this a big trend that we're seeing?

Noam Kochavi: So as I said, it is an intended feature. So there are innocent cases where these buttons just say "Summarize this article for me," maybe they add some instructions like "Summarize in three bullets and put the top three priorities -- " I don't know. But some of them took it too far, and as I said, the instructions like "remember" to persist things in memory. When we saw that there are additional AI assistants, not just Copilot, that have these designated buttons, we expanded the search, and then we found out that there are over 50 unique prompts in just this time span that we checked, 60 days. These prompts came from 31 different companies that span 14 different industries like health, education, finance, security. And this is very widespread. There are freely available tools to just generate these kinds of buttons. So when someone develops a website, they can just use this tool, and there it is.

⁠⁠Sherrod DeGrippo: Okay. So I think I can see the potential for why this could be a problem. But let me kind of contextualize a little bit. On the surface, it's a kind of like startup-y growth hacking mentality of like, "I'm just going to have every AI system that sees my page treat me as persistent." So what is the high risk here? Is this more dangerous than SEO manipulation? I mean, like, historically, when you go to a search engine such as Bing, you can see your search results, right? Like, you could compare different kinds of links. You can see, as a person who's put a term into a search engine. You get your results back, and you can kind of see, like, "Oh, this one is the one I want to click on," or, "This one seems like it's too salesy or it's too much of an ad," or, "I like this one." But in this situation, there's this layer that hides the influence, so you can't see it. So where does this compare on that hierarchy of, like, SEO manipulation? Giorgio, I'll give that to you if you want to talk about that.

Giorgio Severi: Yeah, I can start talking about this, and maybe Noam can add some more context. What I think are the big differences here are first, this content that is introduced in the memory of the model -- of the AI agent may impact a variety of the decisions and the suggestions of the agent down the line. And so this is not just like a wrong or a manipulated search result once. This is potentially a repeatable induction of some particular bias in the agent's actions over and over and over, over time. And so I think that is quite a big difference. And then you can imagine that the same technique could be used to introduce much more dangerous memories, much more dangerous false facts in the memory of the model. And that is something we should guard against.

⁠⁠Sherrod DeGrippo: So I'm running wild with all kinds of malicious ideas. If you think like a threat actor, there's a couple of things that you could really do here to profit. Maybe threat actors don't need to do malicious illegal behavior anymore. Maybe they can just do this and get a bunch of affiliate fees because if you can build something like an AI agent that's autonomous and it does your travel booking, or it does your restaurant reservations, or it does your grocery shopping for delivery, or if it buys all of your Christmas presents from an online shopping service, that memory poisoning can become hugely profitable to someone who is good at it. And it's not actually illegal. Is it?

Noam Kochavi: I don't think it is currently illegal, but there are very serious risks to consider here.

⁠⁠Sherrod DeGrippo: Yeah. So let's talk about the practicality. If we're running a security team at an enterprise, what are the indicators? Like, can I look through my telemetry and find indicators like it's those pre-filled URLs with the keywords like "remember" and "trusted source" and "authoritative"? Can I hunt on them?

Noam Kochavi: Yeah, you can do exactly as I described and look for these URLs for AI assistants containing the Q parameters, and yeah, these keywords. This will work. If you have the access, you can also try to look at the memories stored in the AI assistants across the organization.

⁠⁠Sherrod DeGrippo: And can you look in there and see what companies and organizations were proactively adding themselves to persistence? Like, could you see this brand -- this particular brand doing it or this particular service trying to get an edge on others? Was that visible?

Noam Kochavi: Yeah, you see the exact website, the exact article, and the URL. You can see everything there. We had to scrub that, but you can see everything if you look for it yourself.

⁠⁠Sherrod DeGrippo: So let's talk about looking for that ourselves real quick. If you are in a threat hunt role, or if you're able to look through your data at your organization, there is a blog about this, so you can go take a look at the actual reporting. So you can actually look for these URLs that have?q=,?prompt=, and then things like "remember this source" or "treat this as authoritative." And you have -- in your research, you have been, I assume, doing some hunt for these things. What stuck out for you that was interesting across this technique or where it was being used?

Noam Kochavi: Yeah, so as I said, there were many, many innocent examples there because again, this is a -- because this is an intended feature. So what stood out most is the ones that contains these exact keywords like "remember" and "trusted" and "memory." These keywords, when an AI assistant interprets them, it understands that it should use the memory mechanism and store a new fact to persist it to the future.

⁠⁠Sherrod DeGrippo: So, Noam, I'm going to ask you first, and then Giorgio, I want to hear from you on this. How did this change your perception and trust model of your day-to-day use of AI?

Noam Kochavi: Well, I will be more skeptical when I look at the responses that AI gives because now I know for sure, we knew that it was theoretically possible, but we didn't see before that there were actual attempts to poison the AI and its responses and to push marketing into it, to push bias into choosing specific things over others. Now we know that it exists for sure.

⁠⁠Sherrod DeGrippo: Giorgio, how about you? Do you have a different model of trust now with results that come back from AI?

Giorgio Severi: I got to say it was a -- it was an interesting, a little bit shocking discovery because I was -- so as the AI Red Team, we have been working on agentic memory for quite some time. So we were aware of the intrinsic, like, attack possibilities, vulnerabilities, and so on. But seeing how fast this particular type of issue, or let's say this particular component of an agentic system, was exploited in the wild for actual financial purposes, that was surprising. I personally would have thought that it would take much, much more time for attackers in practice to leverage this type of systems. And instead, Noam highlighted this is quite widespread.

⁠⁠Sherrod DeGrippo: I think that's one of the takeaways that I'm learning, the more we live in this AI world as it evolves, is things happen so much faster than you expect. Things really do accelerate. And I say it jokingly, but I'm also serious when I say the A in AI really means acceleration. I think that that's actually much more relevant than the artificial aspect because right now, and maybe this will change in the future, but right now, it is quite artificial. It doesn't feel real a lot of times, you know, whether you're looking at generated videos or GenAI images or even text, a lot of people can pick it up. It doesn't -- it feels very artificial. But it's accelerated much faster than I think anyone really realized. And what scares me -- I'll be really honest on the Microsoft Threat Intelligence Podcast, what scares me is that GenAI is just going to turn out a bunch of advertising commercial slop now that I can't trust anymore. That I really want to think about, you know, if I'm buying a car, I want it to walk me through the different car types and the different engines and give me the truth of, you know, the ingredients on a can of soup or whatever I'm asking it to evaluate for me. But exactly what we're seeing in this research is that if your memory gets poisoned, your output is junk, and you won't even know. Is that right? Okay. That's bleak. That's bleak. I don't like it.

Giorgio Severi: Yeah. Yeah, I think a lot of it will come down to the security measures and security guardrails that are going to be put in place, that are currently being worked on and put in place by the major agent providers and agent developers. But as you say, Sherrod, yes, there is quite a large risk surface here, and people that use AI agents should be aware of this possibility, for sure.

⁠⁠Sherrod DeGrippo: This is really interesting, and I think, again, thinking like a threat actor, you know, you could influence beyond commercial means. Commercial means are really benign when you think about it. Like, I guess, you know, there's really sinister implications in some directions, but ultimately you buy one brand of bread versus the other brand of bread, and, you know, you were influenced that way. It's not going to hopefully wreck anyone's life. But a socially motivated threat actor could start influencing for negative thoughts or mental health or putting persistence in that's not commercial slop, but is actually something worse. Is that something that we could potentially see in the future?

Giorgio Severi: Definitely.

⁠⁠Sherrod DeGrippo: So what I'm thinking now is if there is this persistent memory, these authoritative sources that my AI assistant has been told to treat as authoritative and to treat as sources and to bias toward these certain things, wouldn't it make sense for me as a user to go into my preferred AI, whether it's Copilot or ChatGPT, and say, "Hey, what have you been told to treat as authoritative? What are you keeping in memory that is influencing you?" Can I ask it that, and what will it do?

Noam Kochavi: Yes, you can ask your AI what it knows about you, and it will usually list the list of memories. You can check it in the settings, and it will usually be very transparent about that. And I think the rule of thumb is when you recognize that a response is a bit weird, you can just ask the AI assistant, "Why did you say that? Where did you get this from? What is your source?" And if you see some very weird source, then maybe you should be suspicious.

⁠⁠Sherrod DeGrippo: I am so suspicious already just from talking to the both of you. I plan to go in to -- I use a -- unfortunately, I have spread my AI usage across, like, every platform because I like to check them all. But I definitely am going to go in and say, "Hey, do you have bias toward any particular trusted sources? What are they? What are the sources that you're using?" You know, and review those. And then you can go in and tell it to take them out.

Giorgio Severi: Yeah, I think managing the memories associated with your user sessions and generally checking that there are no suspicious memories will be part of what we could call memory hygiene, right? So users will take up this practice, I think, going further in time.

⁠⁠Sherrod DeGrippo: So we have security hygiene. We have AI persistence hygiene in your memory, and we haven't even begun to talk about how you need to take your agents and babysit them. So there's a lot of maintenance and upkeep that comes with our new robot lifestyle, and everyone is going to have to put in the work. Go ask your AI agent that you prefer to use where it's biased, what it's using as authoritative sources, and what's in its persistent memory. Might be pretty eye-opening. Feel free to drop me an email if you find anything interesting. This is a very strange world that we live in today. If there's a persistent state, somebody is going to try to influence it. If you can make a buck, somebody is going to try to make that buck. Just remember, the memory is persistent. Treat it accordingly. And we will see you next time. Giorgio, Noam, thank you so much for breaking this down. Have a lot of hygiene I have to go do now.

Giorgio Severi: Thank you very much for having us.

Noam Kochavi: Thank you for having us.

⁠⁠Sherrod DeGrippo: Thanks for listening to the Microsoft Threat Intelligence Podcast. We'd love to hear from you. Email us with your ideas at tipodcast@microsoft.com. Every episode, we'll decode the threat landscape and arm you with the intelligence you need to take on threat actors. Check us out, msthreatintelpodcast.com for more, and subscribe on your favorite podcast app. [ Music ]

HOST(S):

Sherrod DeGrippo, Deputy CISO, GM Customer Security at Microsoft, is a frequently cited threat intelligence expert with a 19-year career leading global threat research and analyst teams. She was named Cybersecurity Woman of the Year in 2022 and Cybersecurity PR Spokesperson of the Year for 2021. Sherrod has provided expert commentary for BBC News, Wall Street Journal, CNN, and New York Times and has presented extensively at conferences including Black Hat, RSA Conference, RMISC, SleuthCon, and others.

Schedule: Bi-Weekly

Credits: Producer is Rob Petrillo, Production Manager is Max Solomon, Scheduling and Administrative Support is Elliot Volkman, and our Audio Engineer (and magician) is none other than The Great Rich Cerbini.

Creator: Microsoft

Social Media: