CSO Perspectives 2.16.21
Ep 38 | 2.16.21

Pt 2 - Amazon AWS: Can you deploy zero trust, intrusion kill chains, resilience, and risk forecasting?

Transcript

Rick Howard: Last episode, I did a deep dive on Amazon AWS to determine if we could apply our first principle thinking to that cloud environment. My first take was that you could probably build fairly robust zero trust and resilience programs but would struggle with your intrusion kill chain prevention and risk assessment programs. By the way, that goes for Microsoft Azure, too. We're doing Google next week, so we'll see where they fall, but I anticipate that it will be similar.

Rick Howard: For this episode, though, I invited some experts to the CyberWire's Hash Table to find out what I got wrong last episode. What I discovered is that there is some disagreement between Amazon's really smart security experts and old-guy security practitioners like me about the value of intrusion kill chain preventions in cloud environments. But that's OK. Disagreement and debate is how we find the edges for these kinds of complicated issues. 

Rick Howard: I also discovered that the environments the cloud providers are building for us anticipate a DevSecOps world that the security community has yet to embrace. From my view, I believe their vision is correct, but the network defender community has a long road to hoe before we realize it. 

Rick Howard: My name is Rick Howard. You are listening to "CSO Perspectives," my podcast about the ideas, strategies and technologies that senior security executives wrestle with on a daily basis. Joining me today at the CyberWire's Hash Table are two AWS security experts and one old-guy security practitioner. From the Amazon side, we have my good friend and colleague Merritt Baer, a principal security architect at Amazon Web Services, and her colleague, whom I just met, Mark Ryland from the Amazon Web Services Office of the CISO. These two also both graduated from law school before they became security experts, so they're about 20 steps ahead of me on any discussion. On the old-guy practitioner side, we have another old friend of mine, Jerry Archer, the Sallie Mae chief security officer. 

Rick Howard: Let's start with a new way to view the world. The first thing that old security practitioners like me need to overcome is to stop thinking about legacy networking concepts. The cloud is similar but different, and to take advantage of the benefits offered by the cloud, you should embrace the cloudiness of it all. Here's Mark. 

Mark Ryland: The problem with cloud networking and security appliances has always been that the person - the vendor or the customer who wanted to run some kind of a security stack could run it on a virtual machine, and you could get packets to flow through your virtual machine, but that's not very cloudy because it's both nonscalable - like, it can scale vertically; you can have a very powerful single instance, but that's still not that great - and nor is it very high availability because if that VM goes down, you could have a hot standby, and you could call in API and shift your route tables and so forth and maybe within, you know, seconds, if you're lucky, or minutes otherwise, you can get the packets to flow to the standby. But that's all pretty kludgy and very noncloudy and very nonnative. 

Rick Howard: In the last episode, I spent a lot of time discussing zero trust between AWS subnets within a VPC, or a virtual private cloud, like rules that prevent the DevOps subnet from talking to the finance subnet. Mark suggests that we all raise our sights a bit and focus purely on VPCs. 

Mark Ryland: And VPCs are cheap. They're free. They're easy to set up and tear down, and so - and they provide inherent isolation. So if you're thinking about microsegmentation, a lot of customers now just think of VPCs almost like subnets, right? They just kind of, you know, have a very limited set of things inside of a given VPC. 

Rick Howard: Since we are using VPCs now and not subnets as our hermetically sealed boundaries between workloads, you can connect them with these things called transit gateways, or TGWs. You use these TGWs to build hub-and-spoke virtual meshes. 

Mark Ryland: That's our mechanism for connecting up disparate VPCs across the counter, even, you know, across region. So you don't really need something like Direct Connect or these more physical technologies that, you know, that map virtual to physical and allow customers to connect in. You don't need one of those to do a pure virtual interconnect and kind of create that hub-and-spoke model through the transit gateway. 

Mark Ryland: Transit gateway has been out for at least two years and maybe three years now, and that is our official way that we do hub-and-spoke and create very - you can create very large networks, you know, of hundreds of VPCs connected together. 

Mark Ryland: There's a special mode where the DX object shows up as an object in your TGW and, you know, you set up route tables so that that becomes your - truly the hub. The spokes can be VPCs, or the spokes can be connections to your on-premises networks. 

Rick Howard: With cloud deployments, the very idea of a perimeter that you protect is morphing into something else, kind of a perimeter, but not quite. 

Mark Ryland: Perimeters are still very useful, but it's not like you have, you know, months and months of running static infrastructure that you - you know, that's - you can do that, and we built a lot of features to enable that. But I'd say the most modern applications are these much more dynamic, you know, secure DevOps pipelines where new deployments happen on an hourly basis and where, you know, the thought of - like, you don't think about so much defending the network as you do making sure that the code you're deploying is very closely monitored. You look for anomalies. You deploy new code constantly. And if there is some problem, the way you solve that problem is by a new deployment of, you know, of the application that, you know, that you've kind of built the security into. So it's a very different kind of mindset. 

Rick Howard: It's true that cloud networks are more ephemeral than your legacy on-prem networks with your big iron servers and hard-wired power plants. With a cloud service like Amazon, all of that hardware is code now, as in infrastructure as code. And with the concepts like serverless functions - Amazon calls these things Lambda functions - you can greatly reduce the attack surface of your workloads by farming out processing jobs away from your VPCs and subnets. 

Rick Howard: All of that is great stuff, but at some point, you still have to store data somewhere, permanently in many cases. Wherever that is in your cloud provider's networks is the new perimeter. I like to call those things data islands because cloud customers put these things everywhere. The word perimeter kind of implies one spot, but data islands gives you the sense that there is more than one, or as Merritt says... 

Merritt Baer: The perimeter is dead. Long live the perimeter. 

Rick Howard: As is typical in the cybersecurity domain, many of us have opinions centered around various network defender concepts that, at first glance, seem to clash with other practitioner opinions. But when examined, we find that the clashing opinions aren't really based on disagreements about how to tackle the problem space. They're more about what we mean about the concept. In other words, as a community, we don't all share the same definitions to common cybersecurity concepts. Case in point - intrusion kill chain prevention. Here's Mark explaining why we don't need intrusion kill chain prevention in cloud environments. 

Mark Ryland: And that goes also to the whole question of the kill chain and, you know, kill chain prevention and all that. A lot of those concepts are kind of legacy concepts 'cause they sort of assume a very static environment in which you have to protect against so-called advanced persistent threats. And I'm not saying any of that's going away overnight, but it's just - it's a really different way of thinking about how you build a modern compute environment or, you know, application environment. 

Rick Howard: So that is one view of the intrusion kill chain strategy, and Mark is not alone in thinking this way, either. So I'm probably wrong. Documentation from Microsoft Azure, Amazon AWS and Google Cloud Platform all tend to ignore this staple of network defense that's been around since 2010. And if I understand Mark's explanation, he implies that the kill chain strategies only apply to government-sponsored, continuous, low-level cyberconflict operations or advanced persistent threat operations - APT operations. 

Rick Howard: But if you just skim the MITRE ATT&CK wiki, you will immediately notice that adversary groups behind all offensive operations have to string together a sequence of steps - the intrusion kill chain - in order to be successful. General purpose crime, crime in the form of ransomware, hacktivism, espionage and cyberconflict between nation-states all have to successfully negotiate some version of the intrusion kill chain to accomplish their goal. So our definitions clash. They are not wrong or right, just different. Here's Merritt. 

Merritt Baer: We think about behavioral ways that could be exploited. We think about how to make the secure thing to do the easiest thing to do. But, like, one of the hallmarks that I sometimes come back to, for example, is our automated reasoning group, where you're actually using formal mathematics as a way to reason about what you know about your network or about your permissions. 

Merritt Baer: And this is just, like, an inherently cloudy thing because, you know, now that you have infrastructure as code, you can do security as code. And, like, if you believe in math, then this will resolve - you know, it's how we verify that our boot code is correct. It's how we - you know? 

Merritt Baer: So there are elements of that sort of kill chain mentality that are just hinging on a set of kind of points in time that may not be relevant now when you can know things in almost real time and with a degree of certainty that doesn't require you to go through a set of steps. 

Rick Howard: I'm completely on board to use math and science to predict when bad things are happening, but that strategy is passive and general purpose and not specific about how real cyber-adversary groups operate. I'm not saying it's bad. It's fantastic, actually. I would throw all of those tools into the zero trust strategy bucket and start implementing right away. But just because the networks are built by software developers writing code and not by tired and old Unix graybeards installing and maintaining big iron servers in their own data centers does not obviate the attack sequence idea. And speaking of old graybeards, here's Jerry Archer, the Sallie Mae CSO, talking about how he uses the intrusion kill chain prevention strategy in his cloud. 

Jerry Archer: So we look for bad behavior in the environment, and we also look for known indicators of compromise through our SIEM, which uses both our local SOC and a global SOC provided by the vendor, who looks for all indicators of compromise. 

Jerry Archer: So we use what used to be Verodin. It's now part of FireEye. But we run constant purple teaming against our environment, against all known indicators of - well, not all known indicators of compromise, but relevant indicators of compromise to look for someone trying to hack into our environment. 

Rick Howard: One last point - I mean, we know 95% of the attack sequences from all adversary campaigns. Again, just skim the MITRE ATT&CK wiki. Doesn't it make sense to deploy prevention controls across the intrusion kill chain for all of them? Sure, kill chain prevention in the cloud is different, no question. But the idea that network defenders - call them data island defenders if you want - can't install prevention controls in cloud environments for all known attack sequences seems incorrect. It feels like we are leaving one big prevention tool off the table that could significantly reduce the chances of a material impact to my organization due to a cloud cyberattack. 

Rick Howard: Sallie Mae is like a unicorn, an established, long-running company that is completely in the cloud - a cloud native, as they say. There aren't that many out there. Here's Jerry. 

Jerry Archer: We are 100% in the cloud, so all of our workloads run in AWS. We have segregated instances for each major application in the environment. We don't have a data center. We don't have any. The only physical devices that we have left are thin clients, about 200 laptops and some routers in the facilities. But basically, everything now runs in AWS. In fact, most of our workforce now run on VDIs that live on servers that exist in AWS as well. 

Jerry Archer: We've been in the cloud now two years - fully in the cloud two years, meaning we started transitioning about four years ago, and we were 100% in the cloud within a year and a half to two years. 

Rick Howard: Jerry is not completely in AWS. He does have some development projects going on in Azure, and they use the Microsoft SaaS application Office 365. So consider them a hybrid cloud user. 

Jerry Archer: We have some instances in Azure - so there's a couple of things, right? One is we use Office 365, so clearly, we use software as a service that essentially runs in Azure, right? We do have some development environments that run in Azure simply as a matter of, you know, project management, development and stuff like that. So it does - we have instances in AWS, and we have a full security stack in AWS. So we can run workloads in AWS - I mean - I'm sorry - in Azure if we want, but we've pretty much transitioned away from workloads in Azure and primarily run in AWS. But we maintain the ability to run in Azure simply so that we have a dual-cloud solution. 

Rick Howard: When I asked Jerry about the single vendor problem, the idea that you would put all of your eggs in one basket for your IT workloads, he said that we needed to consider a different partnering mindset for cloud services compared to if we were buying a security appliance for your data center. Jerry said that he's established a long-term partnership between Sallie Mae and Amazon that's designed on purpose to be multiyear. I mean, we're talking five years at least and probably longer. 

Jerry Archer: Well, I mean, I think at the end of the day, the services and the functions that Amazon offered were compelling. Like I said, we maintain a security stack in Azure in case we wanted to move back. As you point out, it would be very, very nontrivial. I mean, you, in essence, are taking advantage of a lot of the capabilities within AWS, so you're not going to easily move out of that. It would take us quite a while to transition back or - I'm sorry - transition over to Azure. I wouldn't say it's years, but it's certainly a significant number of months it would take us to move back into Azure. So we're very committed to AWS, and they've proven to be a good partner so far. The opportunities that AWS offer us in terms of technology and capabilities is superior. 

Rick Howard: But even with that long-term partnership in place, Jerry provides some distance between Sallie Mae and Amazon by using third-party security tools. 

Jerry Archer: From a security viewpoint, we maintain a level of independence from AWS as best we can. We use tools that consume all of the AWS security stuff. But at the same time, the tools that we use right on top of that event data - and, therefore, are somewhat independent. I mean, if we - again, our security stack in AWS and our security stack in Azure have a lot of common elements to it. So, you know, we've tried to maintain a sort of a separation so that we have a level of independence and, obviously, a third-party perspective on what's going on in our environment. 

Rick Howard: One of the great benefits of cloud environments is the characteristic to automatically scale based on the current situation. Like if you were a Domino's Pizza, your workload would be at a level 10 on a typical workday. But on a Super Bowl Sunday, your workload would have to surge to level 10 times a hundred, let's say. The old way we did that in the cloud is with something called ECMP, or equal-cost multipath. It's a routing mechanism to achieve almost equally distributed link load sharing. In other words, you can share the workload across multiple EC2 instances. But here's Mark explaining why ECMP is not optimal. 

Mark Ryland: For example, with ECMP, if you have a new node that's an equal-cost path and you add that, then you rehash all the existing flows and you break a bunch of them to put them onto the new node, where we don't do that. We keep all the flows, you know, stuck. 

Rick Howard: The new and improved way to build this capability in AWS is something called an API gateway. And this is important because if you want all of your EC2 traffic to pass through a scalable security stack somewhere, the API gateway is the way to get it done. 

Mark Ryland: A fleet of networking appliances can all be brought together in a cluster, and we will spray the packets across them in a flow hashed fashion, keeping the flow sticky so that, you know, any given connection goes to the same appliance. But as the load scales, we can automatically add more and more of those appliances. And as the load decreases, we can take them out of the rotation. So basically, we have a way now to do a real cloudy version of networking appliances, whether they're security appliances or any other thing, using this technology. And it's quite a breakthrough. 

Mark Ryland: So Palo Alto, Check Point, all the major players have now adopted this. And we've adopted it. We have a built-in feature called Network Firewall, which is essentially layered on top of this. We make it available to our partners so they can do the same things. That's the way, like, routers do horizontal scaling, essentially. Or you can have, you know, like, virtual - it's a way of parallelizing network flows, basically. 

Rick Howard: So in a way, API gateways act like a load balancer for your security stack, and you can connect them to your transit gateways, or TGWs. 

Mark Ryland: It's exactly like a load balancer. But what's not like a load balancer is it's not - it looks like one - it's - the consumer who's sending packets sees one IP address. There's actually dozens of nodes, essentially, servicing that traffic. And then as the packets exit the far end of this cluster, they all - the next hop sees one IP address. So it looks like one thing to the network, but it's actually a cluster, and the packets are sprayed across the cluster as they traverse through to get the processing done on them. So that technology can be deployed, you know, between VPCs, between a VPC and a TGW. So it's basically kind of entry and exit from VPCs is where you can utilize this type of gateway. 

Rick Howard: So that takes care of north-south traffic, meaning traffic between the EC2 workloads and the internet. But what about placing a security stack between two different VPCs, the so-called east-west traffic? 

Mark Ryland: If you think of VPCs as kind of east-west - like I have, you know, a workload in X, in VPC one. I've got a workload, Y, in VPC two. They're connected through a transit gateway. Then that type of east-west is supported today because it's - you know, VPC entrance and exit is where you can insert these networking objects. 

Rick Howard: The API gateway provides automatic resilience. If I did it the old way with the ECMP method, I would have to write code myself to monitor for overload conditions and then automatically deploy new instances of my security stack when needed. With the API gateway, AWS is doing the monitoring and scaling for me. 

Rick Howard: And to bring this discussion back around to my data islands idea for a second, Sallie Mae uses a networking trick to define a perimeter around their AWS assets. It's called a software-defined perimeter, or SDP. Here's Bryan Embrey from Pulse Security explaining what it is. 

Bryan Embrey: Software-defined perimeter, or SDP, evolved from a U.S. Department of Defense initiative that sought to validate devices before they connected to the network because connecting a device before validating its identity can lead to vulnerabilities. SDP enforces an authenticate first, connect second model, where users and their devices are not allowed on the network without first establishing trust. 

Jerry Archer: So we use software-defined perimeter in order to obscure Sallie Mae from the world. So without being pre-authenticated, the only thing that you can see from Sallie Mae is one port that would allow you to send your credentials into the SDP controller, and that's it. You can't see any part of Sallie Mae. It's completely obscured from the world. So you can buy SDP both as a service, meaning you can actually buy - essentially, they will set up your EC2 instances in an SDP environment where you can just run SDP, which is what we do. 

Rick Howard: I have this habit of referring to security practitioners as - I'm using air quotes here - "network defenders." But in this new world of building cloudy networks in a DevSecOps context, Merritt points out that maybe it's time to retire the network defender label. 

Merritt Baer: I had the same thought when you said network defenders. I thought, that's not really who we're hiring either. Like, think about the people who create - who build the security around what you're doing in cloud. It is not, quote, "network defenders" necessarily. That's just not an archetype that persists with a lot of relevance. 

Merritt Baer: We're actually, you know, looking for folks who are DevOps people who, you know - like, we can democratize security into the way that we do everything. And you have to. You know, it will only constrain your organization to have siloed your sort of defense from your operations from your innovation center. 

Merritt Baer: The only way to do this at scale - and we're talking about really big scale. If we're speaking from, you know, the Office of the CISO of AWS, we're talking about, you know, scale that is astronomical for, you know, Amazon itself. And to really do that type of work at scale, it does have to be automated. And it also needs to be that inherently cloudy approach. 

Rick Howard: Since the DevOps movement started back in 2010, I've been thinking that security practitioners would have to change their basic DNA. Instead of having security skills with a little bit of programming experience sprinkled on top, the new security practitioners will be coders first, with security sprinkled on after. But according to Mark, that's probably the wrong model. 

Mark Ryland: It's not that security people turn into devs, but what we see is that you create teams of people with the right expertise. And so the developer sits alongside the security person, and they do mind meld and they build together the automation that, you know, is needed. 

Mark Ryland: And so the security person's happy because they're not doing so much repetitive work. They're becoming a requirements person for the developer. You know, seven times a week, I go do this stupid task, and I check this and I check that. Can you automate that? And the developer says, yeah, I can do that. I wouldn't have known what to do, but now that you've told me, I absolutely know what to do. It's - cross-functional teams are absolutely the future. 

Mark Ryland: I'm reminded of a panel discussion I was on one time where people were talking about data science and how, oh, gosh, you know, where are we ever going to find these people that are, you know, that are statisticians, they know how to code, they know how to think about numbers - blah, blah, blah? And I said, you don't have to find anybody like that. You find four people, each of which has one of those critical skills, and you put them in a group, and they do amazing stuff together. 

Rick Howard: I think Mark is onto something here. In hindsight, my idea of morphing an old Unix graybeard network defender into a DevSecOps wizard was, at best, naive and, at worst, really dumb. That said, the security community is a long way away from this DevSecOps team model. From my observations, I don't see a lot of organizations embracing it with any speed. We do have our work cut out for us. 

Rick Howard: After two weeks of looking into Amazon's AWS, I have come to a similar conclusion that I came up with after looking at Microsoft Azure. There are many good reasons to move your workloads to the cloud just in terms of keeping your business competitive. 

Rick Howard: For security, though, two big reasons are that these cloudy environments can be better supported in terms of resiliency and zero trust than you are probably doing right now back on prem. The downside is that if you are going to pursue the intrusion kill chain prevention strategy, you'll have some work to do since neither Microsoft or Amazon have embraced the idea. You can do it, but you will have to install third-party tools like Jerry is doing at Sallie Mae. 

Rick Howard: For risk, cloud provider SaaS applications will give you plenty of telemetry to throw into your risk forecasting models, though forecasting risk is still a bit of a black art, and the cloud providers aren't helping you with that. 

Rick Howard: And that's a wrap. This caps off a two-part miniseries on first principle thinking in Amazon AWS. Prior to that, we did two shows on Microsoft Azure. As part of those shows, I wrote deep-dive companion articles that include significant reading lists. If you are looking for more information, you can find everything at thecyberwire.com/pro/cso-perspectives, all one word. That's thecyberwire.com/pro/ - all one word now - cso-perspectives. 

Rick Howard: The CyberWire's "CSO Perspectives" is edited by John Petrik and executive produced by Peter Kilpe. Our theme song is by Blue Dot Sessions. And the mix of the episode and the remix of the theme song was done by the insanely talented Elliott Peltzman. And I am Rick Howard. Thanks for listening.