Research Saturday 9.25.21
Ep 202 | 9.25.21

Vulnerabilities in the public cloud.


Dave Bittner: Hello everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Ariel Zelivansky: We've been focusing for a long time on researching the public cloud. We've also decided to look into the actual public cloud that is offered by vendors, and Azure ACI was one of the main platforms that we were interested in.

Dave Bittner: That's Ariel Zelivansky. He's Senior Manager of Security Research at Palo Alto Networks. The research we're discussing today is titled, "What You Need to Know About Azurescape."

Ariel Zelivansky: So, ACI is actually a CaaS platform – that is, a container-as-a-service platform. It allows users to run containers without having to set up Kubernetes and all the containers configuration that they usually have to do to set up containers. That is making ACI really attractive to use by companies and organizations that want to make a transition to the cloud. And for us, as the research team realized, that is a good point to start looking for vulnerabilities.

Dave Bittner: So, can you give us a little bit of the background here, just for folks who may not be completely up to speed on this stuff, of some of the benefits of having these container instances and the general security that they provide?

Ariel Zelivansky: Not only containers, but also the public cloud in general is more secure than running on-prem in most cases. The fact that there is a vendor that is taking care of security and making sure everything is up-to-date, patching any security issues, and in general, monitoring what is going on in their instances – that adds a lot to security. At the same time, we, as a research team, invest in finding flaws and other issues that may have been mislooked. So we try to find something that would allow us to do a cross-account attack.

Dave Bittner: Hmm.

Ariel Zelivansky: The way these containers are running on the public cloud is that there are multitenant. So, one container is actually running on the same Kubernetes instance as another user's or organization's cluster. For us, this means if we could actually break out of a container and somehow access another user's data and instances and potentially execute code, this would be a really impactful vulnerability. And what Azurescape is about, we've been able to break out of a container and access containers of other organizations. For us, this was the holy grail of finding a vulnerability in the public cloud. And we're really happy to have found this vulnerability so we can talk about this as a threat, a potential threat. You know, containers by themselves can be broken out of, and we need to understand how to keep the instances secure, even if one is able to break out of container.

Dave Bittner: Well, let's walk through it together, because it really is a fascinating story, the way that you all went about this. You spin up your own instances here within Azure, and where did you begin? Where did you start looking?

Ariel Zelivansky: Right. So, Yuval, one of the researchers in the team, was able to identify that there was an old vulnerability in runC, which is the container runtime that runs the containers in Azure. And this was actually a vulnerability that we've looked at in the past and we already had an exploit available for that vulnerability. So all we have to do is just execute that, and we were actually out of the container. It was a really surprising moment for us to find that we could just exploit that.

Dave Bittner: Yeah, this was because it was an older version running on Azure servers?

Ariel Zelivansky: Correct. So, Azure was running an old runC instance. In fact, this was not a vulnerability just by itself. The model of ACI, the architecture that they built ACI upon, is actually supposed to tolerate breakouts. Not only this vulnerability could have existed, but kernel vulnerabilities and kernel zero-days could have been around, and Azure is actually supposed to expect attackers to break out in some cases where they do have a zero-day vulnerabilities in the kernel. By itself, although it was surprising, it was not enough to contact Microsoft and tell them, hey, there is a potential RCE vulnerability on your instances. But we did contact Microsoft at this point just to get the conversation started while we were actually trying to see what else we can do.

Ariel Zelivansky: And really, the research begins at this point. When we have code execution on the host – that is, the Kubernetes node that is running the container – we want to see what else is around, right? How can we actually navigate and do lateral movement to other containers? And the way it is actually contained – the container is contained inside ACI is that it is code per node. That means the container itself is its own VM, its own node. So there are no other containers by other users on that same cluster, on the same node, so we'll actually have to find a way to get privileges on Kubernetes itself to access these other nodes. And really, I think that was the most interesting part of the research. For Yuval to be able to find a JWT token that actually included credentials that allowed him to escalate privileges on that Kubernetes cluster.

Ariel Zelivansky: From that point, we actually had cluster admin, which is the administrative equivalent for Kubernetes. And Yuval could see containers by other users, see other nodes, list all the organizations that are running on that node, on that cluster, and potentially execute code on them. So, not only we could steal data, leak the credentials for other organizations, but we could actually execute the code on their containers. That was really scary to see. And we started the process of writing an advisory and disclosing this to Microsoft at this point.

Dave Bittner: Well, I mean, walk us through the discovery of that token. That's an interesting story in itself.

Ariel Zelivansky: Correct. So, there is another component that is called the bridge. It is something specific to ACI – Kubernetes doesn't have this component, so Yuval actually tried to find what is going on in the network level, so he could understand how commands are getting executed on the container. So, the JWT token is actually coming when Yuval or the researcher or the attacker is doing an external command to their container. Rather than doing that through the kubelet – the main Kubernetes frame – there is this bridge component that is sending this request to the container. And in that request, we just were able to find that token.

Dave Bittner: So, once you get that token, then what do you do with that? What's the process by which you exploit the fact that you have that to do the things that you want to do?

Ariel Zelivansky: Right. So, the token is actually already high-privileged, and it allows executing code inside containers. So, from there, what we've done is execute code on the API server, which is what is actually – this is the component that is responsible in Kubernetes for making decisions regarding authentication. So, from there, the way to cluster admin was really quick.

Dave Bittner: Yeah, that's fascinating. And as you say, I mean, a bit, I don't know, surprising, scary, perhaps, is the word, because it seems like, you know, keeping these nodes separate from each other is one of the basic value propositions of working in this sort of environment.

Ariel Zelivansky: Right. So, multitenant platforms in general, not only Azure, have this as their main threat, and I think it's been recognized already, although we're the first to find and disclose such a vulnerability. It is something that's been discussed and considered for a long time. This is something that is the most difficult to defend from, because really you have this zero-day vulnerabilities and even existing vulnerabilities that could be exploited that allow attackers to navigate within your platform, right? After we were able to break out of the container and run code on the node, we were not expected to be exploring that area.

Ariel Zelivansky: And this is where the main responsibility of the cloud provider itself, and not the user, is to secure that area. Because the user really has no visibility of that place. The user can do anything that they can to protect and detect activity on their containers, but once it's on the node, really, this is the domain of the cloud provider. So, there has to be a lot of mechanisms in place, and really most of the attacks that we've tried and the ways we tried to escalate privileges before we found Azurescape, were actually mitigated by Microsoft.

Ariel Zelivansky: So, although we were able to find this specific escape and privilege escalation, we actually tried a lot of techniques and things before that that were not possible because Microsoft has invested in mitigating them. So, I think this is a community effort. We, as researchers, companies working in this cloud security space, and cloud providers themselves have to all invest in securing multitenant platforms. I think it is possible, and the fact that we've been able to find that one vulnerability doesn't make the cloud less safe. As in legacy platforms, vulnerabilities were really discovered and fixed and patched, so users can protect themselves.

Ariel Zelivansky: In this case, ACI Azure can actually apply the fix directly to all their users without having to ask users to take KB or patches and apply them. So, we found this vulnerability, disclosed it to Microsoft, they took the fixes, applied them directly to the platform, and now everybody is safe. This is a process I believe will continue to be ongoing for all cloud providers, not only ACI. And I'm happy to be part of this community researching these platforms.

Dave Bittner: Well, there's a little more to the story as well. As you point out in the research that you published here, after you reported the token issue to Microsoft and they they released a patch for it. You all continued to do a little more digging and you found another way in.

Ariel Zelivansky: Right, this was another way to escalate privileges. This was an SSRF vulnerability. As I mentioned, we were speaking to Microsoft about fixing these issues, and we continue to look around and try to find any way we can do things we're not supposed to, like stealing credentials of other users, executing code on their containers, and potentially just spinning our own containers on other clusters. This was just another way to escalate privileges, in essence.

Dave Bittner: Can you give us some insights on the types of tools that you all are using to be able to analyze these container environments? What sorts of things are you using to be able to see what's going on behind the scenes there?

Ariel Zelivansky: Great question. So, I did want to mention we have developed a tool that is called "WhoC," or "Who Contains." And it allows us to extract the container runtime from the cloud provider to actually see what is – how our containers are being run. The way containers are now run in these platforms, where you don't have the Kubernetes cluster and you don't control anything, is it's on the vendor side. So, before using that tool, we had no visibility on how containers are actually getting executed and what is the container runtime. And we see it's able to copy the binary that is executing the container in its bootstrapping process and send it back to us. And that's how we initially found this old runC instance and understood we could exploit the potential of vulnerability to escape out of the container.

Ariel Zelivansky: We encourage researchers – and we're also doing that – to run WhoC on all cloud platforms to understand and explore how platforms are being built. It's not open-source, so we need to do some research and exploring to understand what is the architecture behind these platforms. And they're also changing all the time, so there is really a lot of benefit for us to use tools like WhoC to understand what's going on behind the scenes.

Dave Bittner: I'm curious, you know, while you were all in the midst of this research, was there any detection on Microsoft's part that you all were bouncing around from place to place, you know, places that you shouldn't have been?

Ariel Zelivansky: Right, so we've been really careful not to run anything that won't be successful, and we didn't get actively detected as we were researching the platform. But as I mentioned, some techniques and things we've tried just didn't work. There were other cases where we research cloud platforms where we've actually been detected actively and were approached by engineers from the vendors to understand what we're trying to achieve and how we've escalated privileges. But in this case of Azurescape, it just worked. As soon as we tried the container escape, we were on the node executing commands, and we were not detected at that point. So, an attacker could have potentially exploited this to just do some cryptomining and earn some quick cash from executing code on the platform directly.

Dave Bittner: Right. So as you mentioned, I mean, Microsoft has been very responsive here, and they've patched the issues that you presented to them. Given all of that, what are your recommendations here? Are there any take-homes for folks who are using these environments to better protect themselves?

Ariel Zelivansky: So, I've mentioned earlier that there is a lot on the vendor side to detect attacks and mitigate vulnerabilities in their architecture. But there is a lot that can be done by the users, by the organizations themselves on their containers. I personally work on Prisma Cloud, so I can talk about that as a potential security mechanisms to detect attacks, but there are really a lot of solutions that allow you to detect attackers as they're running code – you know, after the post exploitation, after they're actually able to get into your container and try to do anything malicious. For example, cryptomining – it's just cryptomining is the most common attack for cloud and container attacks – this can be detected. And even if you don't stop the attacker on their way in, there are many ways where you can catch them at runtime level or network level after something malicious is getting executed.

Dave Bittner: Our thanks to Ariel Zelivansky from Palo Alto Networks for joining us. The research is titled "What You Need to Know About Azurescape." We'll have a link in the show notes.

Dave Bittner: The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing CyberWire team is Tre Hester, Elliott Peltzman, Puru Prakash, Justin Sabie, Tim Nodar, Joe Carrigan, Carole Theriault, Ben Yelin, Nick Veliky, Gina Johnson, Bennett Moe, Chris Russell, John Petrik, Jennifer Eiben, Rick Howard, Peter Kilpe, and I'm Dave Bittner. Thanks for listening. We'll see you back here next week.