Securing the Future of AI Agents

Transcript

David Moulton: Welcome to "Threat Vector", the Palo Alto Network's podcast where we discuss pressing cybersecurity threats and resilience, and uncover insights into the latest industry trends. I'm your host, David Moulton, Senior Director of Thought Leadership for Unit 42.

Nicole Nichols: The future is now, and our expectations are wrong, and we need to constantly question how to be better about AI security. It's extremely general, but it's going to be true for a long time. [ Music ]

David Moulton: Today I'm sharing a conversation I had with Nicole Nichols, Distinguished Engineer for Machine Learning Security at Palo Alto Networks. Nicole holds a PhD in electrical engineering and brings over a decade of experience in adversarial machine learning, cyber defense, and applied AI research. She's worked at Apple, Microsoft, and PNNL, led AI security research for national security programs, and contributed to pioneering efforts like the CyberBattleSim. Today, we're going to talk about achieving a secure AI agent ecosystem, her new paper on the subject, and why the timeline for AI deployment in cybersecurity might be accelerating faster than anyone anticipated. [ Music ] [ Silence ] Nicole Nichols, second try on "Threat Vector". Thank you so much for making the time and coming to talk to me today.

Nicole Nichols: Absolutely. Glad to be here.

David Moulton: I was hoping that we could have this conversation in person, but we ran into some technical glitches and some urgency, to use a room too quick. And today we don't have that, we don't have that hanging over our head. So I want to talk to you about the new paper that you're writing, the "Achieving a Secure AI Agent Ecosystem". A lot of vowels in that one, which outlines both the urgency and complexity of securing these next-generation systems. I've got to ask, what motivated you to run this cross-institutional collaboration that you've done with RAND and Schmidt Sciences, and did that workshop that you ran with those organizations shape your perspective on what we need to do first in this AI agent ecosystem?

Nicole Nichols: Definitely. So, I've been thinking about this problem for a couple of years, since before GPT popularized agents. By circumstances, fate, coincidence, whatever you want to call it, I started working at Microsoft in 2020 on autonomous cyber agents and trying to, at the time, it was considered a, you know, 40-year out-there goal to potentially have anything like an autonomous agent working in cybersecurity. And now fast forward five years and people are building autonomous agents and applying them in cybersecurity. And we can argue about, you know, is that something that's, you know, end-to-end or just partial? But either way, we're so far ahead of where we thought we were going to be.

David Moulton: So for the listeners who are maybe hearing you say 40, 4-0, I heard that right?

Nicole Nichols: Correct.

David Moulton: And then five years later.

Nicole Nichols: Yes, and five years later.

David Moulton: Like, we're ahead of schedule, if you will.

Nicole Nichols: Yeah, exactly. So, in some sense, there have been some really key blockers that were unlocked with generative AI that enabled these massive shifts forward in what we could potentially do in cybersecurity from a decentralized perspective. I think that it's not a stretch to say that the timeline has been dramatically compressed to what we expected. And the other thing that I've seen in this path is that there is a disconnect between the people developing the AI technology, who have been pushing the envelope of what you can do with generative AI, and cybersecurity, and how to develop best practices in cybersecurity, and where those intersections are. Because there's unique threats in the AI itself that are being pulled from the AI research community. And there's a lot of "gotchas" in both how the AI is being designed and how it applies in cybersecurity that have made for a unique environment. And that's, you know, to your original question that I've been rambling on about for a while now, is it's that disconnect that I've been trying to bridge. And this workshop is where I really tried to do that, by pulling together experts from both AI and cybersecurity to put together a brain trust and figure out what are the things that we weren't realizing from our individual blind spots. Because nobody can contain all that knowledge in their individual brain, to figure out what can we anticipate in this new ecosystem and how can we best prepare ourselves to secure it.

David Moulton: So as you were talking, I'm reminded of William Gibson's quote, "The future is here, it's just unevenly distributed." I think I'm getting that close.

Nicole Nichols: Extremely true.

David Moulton: Yeah. You know, when somebody says it's here now, or maybe it's 10 years away, I think that's a matter of your specific perspective. So, let's start with the paper. In it, you outline a couple, what, three foundational pillars of AI agent security.

Nicole Nichols: Yeah.

David Moulton: Protecting agents from third-party compromise, protecting user -delegated agents from the agents themselves, and protecting systems from malicious agents.

Nicole Nichols: Yeah.

David Moulton: Why was it important to organize the security landscape in this way?

Nicole Nichols: Two reasons. One, we wanted something that functionally organized the types of defenses and gaps that needed to be addressed in the pillars. And so they kind of go from having the most certainty of potential solutions and mostly engineering-based solutions with some evaluations of what may need to be added, to there's a lot of unknown AI research that needs to be done, to, you know, kind of far future thinking of, you know, worst-case scenario planning. And so the three buckets kind of span in that direction in terms of understandability, but also functionally in terms of where they're integrating into the adoption stack. Things that are being developed now with low-code, no-code solutions tend to fall in that I'm mostly going to need to secure against third-party attacks. And as the sophistication improves over time, we're going to be looking at, How do we ensure the assets and goals are aligned. It's a concern now, but those tend not to be the apps that are actually being deployed in scale right now. That's something that's more of a in-development type. And then the third of, you know, looking at malicious actors is even further out.

David Moulton: Okay, so it's kind of like right now, tomorrow, and later.

Nicole Nichols: Yeah.

David Moulton: Like a Time Horizons piece.

Nicole Nichols: One, two, and more. Yeah.

David Moulton: There you go. You know, I want to get to my next question, but as you were talking through this, how much did AI help you write a paper on AI, to be very meta about this?

Nicole Nichols: I will say it was a non-zero contribution, and some of that was trying to do simple things like rephrasing things so it was less wordy. I actually explicitly tried to avoid having it write any sections wholesale, because I find that it tends to write stuff that sounds good but means nothing.

David Moulton: Yes. Yes.

Nicole Nichols: And as I wanted to write this paper, my whole goal is I didn't want to be another piece of noise in the environment of, you know, crying wolf, or not crying wolf, but just kind of raising alarm without producing a solution. And so as I wrote it, I really only would give it, like, a couple of sentences or two and being like, give me some ideas about how to merge these ideas. I really wanted to focus on the contributions.

David Moulton: One of the standout ideas in your report is the need for a agent bill of materials, kind of like a software bill of materials. How should organizations think about the provenance and component tracking when deploying agentic systems?

Nicole Nichols: A lot of agent security is going to be defense in depth. And the bill of materials piece is really only providing one component of the ecosystem defenses. And it definitely hits directly at the provenance piece, ensuring that you're deploying the model you intended, using the training data you intended, and, you know, reaching the audience that you want, I think, some stretch of that. It's really focused on that provenance step. And so if we think about the landscape of, you know, say there's 50 threats, that's a solid defense for three of them. And then we piece together defenses for the other piece. And so in some sense, it's kind of like, the lowest hanging fruit as we think about building AI ecosystems is ensuring we have provenance, in part because of the problem of hallucinating libraries, and by extension and agents hallucinating tools, where you can have the agent think that it's calling Palo Alto Network's auto defender tool and instead it's, you know, reaching APT's nefarious defender tool and taking the wrong actions. And so you want to prevent hallucinations like that by ensuring that you've got the supply chain provenance there.

David Moulton: Nicole, what lessons can we borrow from traditional software bombs and where do we need new standards?

Nicole Nichols: Yeah, so for agents, a lot of that comes down to the threat intel sharing. There's a lot of ongoing work trying to understand what is the metadata that needs to accompany an AI vulnerability? What are the tools we use to even identify an AI vulnerability, those are not standard yet. And how do we not only share that information but share it with the right people. Because right now, the CDE system is really set up to go from MITRE to the PSIRT teams at, you know, whatever tech company you're working at. And when we add in the layer of AI agents, the AI researchers, engineers, people in ML Ops that are deploying these tools, they are only loosely connected to the PSIRT teams. And so if those vulnerabilities impact those teams, we need to make sure that that information is being delivered to the right people. And yes, we're all at the same company, but a lot of that depends on knowing the right people, and ensuring that people recognize where that information needs to be triaged to, and kind of greasing the wheels on those communication paths so that we can respond effectively.

David Moulton: Agent containment is a recurring theme in your report. What would a robust containment-and-recovery strategy look like for a compromised AI agent?

Nicole Nichols: Honestly, that's somewhat speculative at this point, because we don't have a good common framework for agents. Every agent is a prototype, which is radically different on the next prototype. I think that where we're seeing some kind of foundational elements towards that agent containment is in part ensuring alignment. I mean, that's before you can get to containing. We're just making sure the agent is doing what you intend. I think we still need some tools on that side, which are going to be dependent on having better interpretability and introspection tools, which is an open and hard problem. But when we think to the containment piece, some of it would potentially be either as we build up the new protocols for agent connectivity to the sub elements within that agent, basically kind of layering in that wiring authentication protocols that are potentially novel. And this is one of the things that was kind of unexpected and challenging, not unexpected, but it was interesting how much consensus there was that the current communications protocols do not have enough security built into them to contain an agent. And so it's kind of a green space right now. And there are a lot of agent protocols being produced through some of the commercial developers, A2A and MCP are the two most popular ones.

David Moulton: Nicole, can you tell me what those acronyms mean? A2A, MCP?

Nicole Nichols: Yeah. So A2A is agent-to-agent and MCP is model context protocol. And there's a very cute article that said that the S stands for security in MCP. And so right now --

David Moulton: It took me a second longer than it should have. There is no S in MCP.

Nicole Nichols: I won't lay judgment on how much security is or isn't in there, but I think that in general, as we design these protocols, we need to, you know, reflect on our time in the '90s and building web protocols and put security first. And right now, those protocols are being built. So, let's put security first. And my other personal opinion that we kind of talked about in the report as well is that commercial forces will be what they are, and the frontier labs are going to produce their own protocols that advance their strategic needs, and they may or may not include security. But even if they include fantastic security, it's always about the weakest link, and nobody is going to completely dominate that space. And so we need to ensure there's intercompatibility, and we need to ensure that opensource tools also have an ability to be secured. And so I think it's actually going to take some very intentional effort by building together a community to define a protocol that can actually be universally adopted that has a security-first mission in terms of that connectivity so that we can have the ability to contain an agent that has gone rogue.

David Moulton: Yeah, speaking about that, can you talk about this idea of disposable or clonable agents as part of a containment architecture?

Nicole Nichols: Yeah. So, you know, there's some consideration that, you know, at the moment, it's very expensive to train the foundation models. And if that is the brain that is powering the agent, the tools are, you know, really what's connected to enable that agent to take actions. And it might be possible that there is some form of cloning of the agent. Once you ground truth it and say "This is the performance we want to do this particular type of task", you kind of spawn an agent to do a task once, and then when the task is complete, it's done. And so you don't have to worry about reusing that agent. Like if through its interactions, its data has been poisoned, its memory was corrupted, maybe it was not perfectly aligned. Like, you kind of have a clean start at the next go that you need to perform a task. And so in some sense, it's a digital hygiene approach to deploying agents.

David Moulton: So for the simpleton in the conversation, I would describe that as the Kleenex model. Once it's been used, we want to throw it out and start over with a fresh Kleenex.

Nicole Nichols: Yes.

David Moulton: No second, no bringing it back.

Nicole Nichols: But in this case, in some sense, because it's digital, it's not like it's going in a landfill.

David Moulton: Right, sure.

Nicole Nichols: It's not going to take more, you know, the model, like, in terms of, like, energy consumption, that model's already trained. We're just, you know, cloning out a new copy of it. [ Music ]

David Moulton: So, one of the other things I wanted to ask you about is this idea of goal integrity, right? This is a real unique challenge with autonomous agents. How can we ensure that the agents faithfully pursue the user's intent and not some misaligned or really corrupt objective?

Nicole Nichols: My hot take on that is that I don't know that we can yet. I think that we're so early in this prototype stage that I don't think the tools fully exist yet. And when I was thinking about this question, I kind of thought about it in three levels. One is in some sense, we can just have completely manual oversight. We don't want the agent to do something wrong, we just check every move. But we lose all of the value of the agent in doing that because the value of the agent is to work at speed and scale that we humans can't. And so the kind of intermediate thing is a deterministic check, where maybe we sample every 10th value or action and have some sort of deterministic process that says we're going to check these places that we suspect there'll be a problem. And it will help provide some of the utility, but it's definitely still a compromise because at the end of the day, generative AI is not deterministic. There are, you know, unique ways in which it will potentially misbehave, and that is guaranteed to miss that. And this is where we get to that piece of we need to understand how LLMs work at a much more fundamental level in order to be able to get a fully kind of from first principles reliability in terms of that alignment of the NLP, or sorry, natural language processing or semantic instruction set, to intent. Because this is the other challenge, is a lot of people really want to lean on formally verified code. So, if you think about formally verified code, it's a great practice, but it doesn't apply to semantics. Like, you and I can have a conversation and intonation and context can change and you can't really formally verify that. And so when we have that natural language interface to describe the goal, you can't formally verify the goal to a provable standard using formal verification.

David Moulton: Right. Yeah.

Nicole Nichols: So we need to instead think about those unknown unknowns that, you know, right now exist in, you know, when these neurons are connected in these ways, is it able to pick up nuance and context of sarcasm and jokes and context so that it's aligned correctly? And so that's a really hard goal. And I think that there's some intermediate things that we can do that are really those compromises of, you know, having deterministic checks. There's other sorts of things that are starting to get there. There's work out of --

David Moulton: Are there frameworks that you think that are emerging that are going to help?

Nicole Nichols: They're starting. I think that they're all very nascent, and what frameworks exist are really in kind of exploratory evaluation and not at scale. The one that I'm thinking of in particular is people are starting to look at separation of data from instructions. And in some sense, that's partly a sanity check of saying, Well, we believe this part of the prompt is data, this part of the prompt is instruction, and the instruction is providing the intent. And so it kind of narrows the scope of ensuring that you're aligning to the intent. And it's kind of that halfway in between solution. We don't know exactly if that parsing between data and instruction is going to be perfect, but it's a good first step forward towards being able to have a method of validating that goal and intent. And so that work is from MSR Cambridge, I believe, Sahar Abdel-Nabi, if I'm pronouncing her name right, is someone who's published on that. There may be some others as well, but that's the one that I'm familiar with. And I think that there's kind of expanding interest in that because of the potential in that framework.

David Moulton: Now, I think if we can, we will put a link in the show notes to that research.

Nicole Nichols: Yeah.

David Moulton: In the paper, you talk about this idea of pre-deployment evaluation environments as a major gap. What would it take to build a reliable and scalable testbed to assess AI agents before they're out in the wild, before they're in production?

Nicole Nichols: I think that's something we're going to have to start scaling up. I think that it will take a lot of goodwill resources to pull together the expertise and scale of providing something that is truly an open facility for people to evaluate. So, this has kind of been a challenge in the AI agent evaluation framework is benchmarks. There's a lot of uncertainty of the quality of a benchmark. For example, evaluating an agent's ability to answer or perform cybersecurity tasks, or answer cybersecurity questions. There's a couple of benchmarks that are starting to look at that, but cybersecurity experts generally don't consider them to be comprehensive enough to fully represent the knowledge domain of cybersecurity yet. And so as we think about these pre-deployment facilities, if you don't have a common benchmark, it's hard to get a common standard to say this is the type of agent that works best. And so we may need to start with a priority domain and a priority scale. So, something like an average enterprise customer that's, you know, 5,000 employees, and see what is the priority of evaluating agents operating in that type of environment, and pick a couple of high-value needs. And this is where I think the challenge is, is they may not necessarily align with commercial interests. And so it's unclear as soon as it drops out of commercial best interest who takes ownership of that. And there's a couple of organizations, such as the Coalition for Secure AI, or some of the AI safety institutes, that could potentially take up that mantle. But it's hard to get buy-in from multiple groups to define specific roadmaps for that, and they take time. And in the workshop when we were discussing it, there was a lot of consensus that, you know, we really need that much sooner than it could be built in order to get that standard evaluation of how does it perform in different environments, with different types of endpoints and different architectures, and particularly around critical infrastructure. Because some of the security practices around critical infrastructure, most people don't have high-fidelity models to evaluate how an agent would behave in that environment. And so, for security practitioners to ensure that their model would be effective in defending those sorts of things, they need better models of those systems to be able to validate that.

David Moulton: So, Nicole, let's shift gears a little bit and talk about malicious agents. In the paper, you've talked about the growing risk. I think a lot of people are thinking about what does the world look like when you can scale attacker activity, attacker behavior. And, you know, what happens when one of those or many of those malicious agents becomes maybe embedded in critical infrastructure? How do you go about building detection mechanisms or hardening your systems? And then how does a defender know what to prioritize in a world where there's this ambiguity right now?

Nicole Nichols: Can you be more specific on which ambiguity?

David Moulton: Do you look at critical infrastructure first? Are you looking at something that has this infinite patience that we talked about last week? Are you looking at something that is a data stealer, instant cash? Are you looking for something that is, it's ambiguous of what it's going to attack or how it's going to attack? Is it trying to poison and bias, or is it just trying to outright steal? Maybe it's a combination of those things, but I'm just trying to figure out, like, you know, what kinds of detection mechanisms or system hardening should defenders prioritize?

Nicole Nichols: In some sense, the most sophisticated actors are always going to be the ones that adopt a most advanced technology first, because they have the resource and motivation and goals to do it. So, in that case, it's unlikely the goals or objectives are really going to change. I think that the objectives of a nation-state actor will remain the same. How they're being achieved is what's changing. And so when we think about anticipating where in our global cyber systems will they be discovered first, it will probably be in those environments, and attempting to achieve those same goals. In terms of identifying them, I think that that is something that will eventually be universal, whether or not the target is a government or whether it's, you know, a large multinational company. I think that the defense tools that we need to build in order to detect if an autonomous cyberagent is operating in your network and how to remove it will be universal. Which is in some sense why I feel like it may be further down the line for those multinational companies, but we don't want to be caught off guard with five years becoming one year in terms of the time we have to prepare for it. And so I think that if we take the mentality of just-in-time planning, our just-in-time planning may need to be shrunk as we think about when to be ready for those threats.

David Moulton: Nicole, you proposed a roadmap toward a secure agent ecosystem with an actionable starting point. If you had to pick one or two areas where immediate investment would have the biggest security return, what would that be?

Nicole Nichols: I'm going to take the engineering's answer and say it depends on who you are. Because I think that, you know, if you are a government, if you are a multinational corporation, or if you are a mom-and-pop business that wants to make sure that your customer data isn't stolen, what you're going to do to prepare is going to be really different. So I'll start with that little caveat. But I think that --

David Moulton: A little hedge.

Nicole Nichols: Yeah, a little hedge. But I think going forward and preparing for this, I think it comes back to those three pillars that we put into the report, which is why we structured it that way. You know, if most of your work is going to be not in the early adopter category, where you started to use technology that's been tried in a couple of places, most of what you'd be doing is ensuring that you're using the best practice tools. And I think that, you know, your security teams will need to be doing some evaluations of which tool is doing the job best from a technical perspective. I think that if you are pushing the edge of, at least today and now, anything that has higher degrees of autonomy, higher degrees of tool connectivity, you're operating in a higher risk or higher unknown AI security environment. And it's worth investing, or at least participating in the community dialogues where we're trying to better understand the interpretability and inner workings of AI models and agents, because understanding that is likely to be what will drive better evaluation of alignment and security of those agents. If you are kind of in the realm of you have long-term planning capacity and research budget, I think it's really important to start investing now in terms of understanding how the AI security landscape is going to change with agents to try and figure out if the kind of detection signals and features that we're using in our AI use of defensive tools will continue to be robust when agents are using unexpected attack paths, or using kind of an infinite patience model to use different techniques than we've seen before. And I think that we need to share more information and be a little more open on this to ensure that from a defensive perspective, we're ready for when the future arrives a little bit sooner than we expect.

David Moulton: So if I were to go back and try to summarize that, One of the things that we should be doing immediately, no matter whether you're a mom-and-pop all the way up to a massive government organization, is this information sharing.

Nicole Nichols: Oh, 100%.

David Moulton: Like, is that like maybe the very first thing? And I think that leads me to, you know, sort of leading the witness a little bit, but if you're listening now, how can our audience get involved? Whether they're building the agents, they're trying to secure the infrastructure, maybe they're securing policy, what would you recommend for jumping in and being a part of defining the future that we're all going to arrive at?

Nicole Nichols: I mean, I think there's two pieces. You know, one is simple awareness. You may not have the research capacity of staff to be building that next generation tool, but you can still be connected to the technology discussions around where those are proceeding. and figuring out which ones are going to be best aligned. And so there is a variety of sources for more information. The AI Safety Institute from the US and the UK, the ML SecOps podcast from Protect AI is an open community resource that's kind of all-access levels of conversations with executives and technicals about where AI security is headed. There's, you know, a variety of technical formats in terms of research exchange around AI security that are associated with ACM. And so if you want to even just go and listen and become familiar, you can attend these conferences remote for maybe a hundred bucks and just kind of listen in. I think that, small side story, at one point I was studying abroad and I took classes that were completely outside of my major because I was interested in it, and the grades weren't going to transfer so if I failed I didn't lose anything and I just had everything to gain from learning. And, you know, it was kind of like, in some sense, I would encourage people to be willing to fail or, you know, accept that they don't know all of the technical terminology and some of the technical conferences, but just ask questions and find people who are supportive of you learning about this. Because there's still this fundamental gap of people who know cybersecurity and are deeply technical in that, people are deeply technical in AI. And the more you can do to help someone else learn something about what you're expert at is going to help all of us become better at securing AI. [ Music ]

David Moulton: Nicole Nichols, second try on "Threat Vector". We finally got it this time. It was a delight to come here and learn from you as we talked about your new paper and agentic AI, and how fast the future has decided to arrive for us.

Nicole Nichols: My pleasure. I love talking about this stuff and speculating and trying to stitch together odd bits of information to help the world be a better place and a more secure place.

David Moulton: That's it for today. If you like what you heard, please subscribe wherever you listen, and leave us a review on Apple Podcast or Spotify. Your reviews and feedback really do help us understand what you want to hear about. If you want to reach out to me about the show, email me at threatvector@ paloaltonetworks.com. I want to thank our executive producer, Michael Heller, our content and production teams, which include Kenne Miller, Joe Bettencourt, and Virginia Tran. Mix in original music by Elliott Peltzman. We'll be back next week. Until then, stay secure, stay vigilant. Goodbye for now. [ Music ]

HOST(S):

Meet David Moulton, the voice for Threat Vector, the Palo Alto Networks podcast dedicated to sharing knowledge, know-how, and groundbreaking research to safeguard our digital world.

Moulton, leads Thought Leadership for Palo Alto Networks, draws on a rich background of experience, including roles in design, strategy, marketing, and sales, to connect with experts from across the globe.

Schedule: Biweekly, Thursdays

Credits: Executive Producer is Michael Heller, Show production by Kenne Miller, Joe Bettencourt, Virginia Tran and David Moulton. Editing and audio engineering by Elliott Peltzman.

Creator: Palo Alto Unit 42