Cybersecurity first principles: zero trust.

By Rick Howard

May 18, 2020

CSO Perspectives is a weekly column and podcast where Rick Howard discusses the ideas, strategies and technologies that senior cybersecurity executives wrestle with on a daily basis.

Cybersecurity first principles: zero trust.

Listen to the podcast episode.

Note: This is the second essay in a planned series that discusses the development of a general purpose cybersecurity strategy for all network defender practitioners-- be they from the commercial sector, government enterprise, or academic institutions-- using the concept of first principles. The first essay explained what first principles are in general and what the very first principle should be for any infosec program. This second essay will discuss the next building block that we will install on the infosec first principle wall: zero trust.

To set this series up, I went through an analysis of what should be the absolutely atomic cybersecurity first-principle building block that we can lay into the ground as the foundation for our infosec program. If you would like to see that reasoning, by all means, go back and read that first essay. No, I don’t mind. I will wait.

After walking through that analysis, it is clear to me that our foundational first principle building block, our cybersecurity cornerstone, is this (spoiler alert!):

Reduce the probability of material impact to my organization due to a cyber event.

That’s it. Nothing else matters. This simple statement is the pillar we can build an entire infosec program on. Which raises the question, what’s next? If reducing the probability of material impact to my organization due to a cyber event is the thing we are trying to do, what is the next atomic building block that we put on the pillar that will help us do that?

I’m glad that you asked.

Closing the digital doors and the virtual windows.

The first thing that comes to mind for me to do, the very next building block in my first principle thinking, is to make it harder for some hacker group to cause material impact. Why should it be easy to get into my network? Think of it like trying to protect your house from common thieves. You could spend a lot of money and time installing, maintaining, and monitoring expensive surveillance equipment and physical security systems, but if you forgot to close and lock the doors and windows when you went out for the evening, the thieves would have a significantly easier time breaking in than if you had. It’s the same idea for protecting your digital assets.

But just what is the equivalent of locking your doors and windows in a digital environment? One illustrative example is the common problem of misconfigured S3 storage buckets in the Amazon ecosystem. Since Amazon made the service available back in 2006, we’ve witnessed a steady stream of S3 bucket exposures. It’s hard to pin down the number, but one pundit I follow put it in the thousands. The thing is, the hackers didn’t break into the S3 buckets using some clever hacker technique. They mostly climbed through open digital windows and doors to the system because the responsible administrators didn’t configure the S3 bucket correctly.

Zero trust, not cyber hygiene.

S3 bucket misconfiguration is just one example of failing to configure a system properly. Depending on the size of your organization, you could have hundreds, if not thousands, of potential and unintentional electronic doors and windows left open during the day-to-day skirmishing of your digital operation. The network defender community has tended to lump the activities of closing these digital doors and windows under the heading of “cyber hygiene.” The original internet founding father, Vint Cerf, the guy who helped build the original TCP/IP stack, coined the phrase when he testified to the United States Congress Joint Economic Committee back in 2000. But the word “hygiene” does not convey the entire scope of what needs to be done here.

“Hygiene” suggests that the onus for this activity is on the employee to protect the enterprise just as it is the individual’s responsibility to prevent tooth decay by brushing their teeth. In the age of continuous low level cyber conflict between nation states, that doesn’t seem fair to blame Luigi, the fry cook in the cafeteria, for the material breach because he clicked the link just like everybody else does. Cyber hygiene is definitely a building block on our first principles wall, but it is not a foundational one. This building block, closing the digital doors and windows on our first principle infosec wall, needs to be much more comprehensive: something solid and weighty, something that will be hard to knock over. This is where the zero trust strategy comes into play.

The origins of zero trust.

The ideas around Zero Trust have been bouncing around the industry since the early 2000s. But John Kindervag published the essential paper that solidified the concept in 2010. He based his thesis on how the military and intelligence communities think about protecting secrets: essentially, treat all information as “need-to-know.” In other words, if you don’t require the information to do your job, you shouldn’t have access to it. To achieve a zero trust posture then, network architects make the assumption that their digital environments are already compromised and design them to reduce the probability of material impact to the company if it turns out to be true. That’s a powerful concept and completely radical to the prevailing idea at the time: perimeter defense. With perimeter defense, we built a strong outer protection barrier but once the attackers got in, they had access to everything. We called this the hard-and-crunchy-on-the outside-soft-and-gooey-on-the-inside network design. My own name for this is the M&M network design; hard candy shell on the outside, soft chocolate on the inside; so soft that the inner network melts inside the hackers’ mouths as they consume your digital assets.

The use case for zero trust: Edward Snowden.

The poster child for the badness of M&M network design comes from the classic insider threat case: Edward Snowden. Regardless of what you think about the guy, he was successful because once he logged in to the high side of the NSA network, he had access to almost every data repository stored there.

The U.S. government maintains a handful of not-directly-connected-to-the-internet networks. The names most of us have heard of are the NIPRNET (Non-classified Internet Protocol)— essentially the U.S. government’s internet—the SIPRNET (Secret Internet Protocol Router Network)—the place where the government can store, share, and communicate SECRET information—and JWICS (Joint Worldwide Intelligence Communications System)—where the U.S. intelligence community stores super secret information commonly referred to as the high-side network.

Snowden purchased a web crawler from the dark web for about $100 and turned it loose on JWICS. He collected over a million highly classified documents, walked out the door with them, and, well, let’s just say, created quite an international incident with what he subsequently did with those documents. Once he legitimately logged on, he had authorized access to almost everything stored there. He didn’t run a Mark-Zuckerberg-level hack that we saw in the movie “The Social Network” to get into JWICS. He basically web-surfed to see what he could find. I guess it didn’t hurt that he had system administrator credentials for many of those systems either.

At the time, the JWICS network engineers had no concept of a zero-trust network. The irony doesn't escape me that John Kindervag based his zero-trust thesis on how the intelligence community typically compartmentalizes its secrets. But to be fair, back in 2013, nobody anticipated that a highly vetted contractor would do such a thing on a super secret network. In hindsight it seems obvious, but back then, the controls that the NSA had in place to vet these workers seemed adequate.

The Snowden incident caused NSA and many network defenders elsewhere to rethink their network designs. For the infosec community, it moved Kindervag’s theoretical paper from an interesting idea to a key design principle that we were all trying to adhere to. This was how we were going to build networks moving forward. And then, nothing significant happened. Most did not build them. It turns out that even though Kindervag’s thesis is brilliant, the practical “how-to” section is sparse.

Where are my zero trust networks?

I have talked to many network security practitioners over the years about the design and installation of a zero-trust architecture. My take-away from those conversations is that most miss the point. They don’t seem to understand that zero trust is not a destination where you will eventually arrive. It is not a set of technologies that you buy or build, install, and then tell your boss, “Well, that’s it. We have zero trust.” It doesn't work like that.

Zero trust is a philosophy, a strategy, a way of thinking. There are a million things you can do technically and process-wise that will improve your zero-trust posture; to lock the digital doors and windows that need to be secured; to make sure that somebody wandering around the digital hallways of our networks will not find a door ajar and wander in to find something they should not have access to. Or, even if they do, what they discover through that open door will not significantly impact the company. The pursuit of zero trust is a journey, not a destination. You will never reach the end, but you can get far enough down the path relatively quickly to have confidence in saying that your zero-trust program has reduced the probability of material impact to your organization due to a cyber event.

The reason many of us have not even begun this journey, this conversion of our M&M networks into zero-trust networks, is because we have set ourselves a daunting task. We think that In order to achieve zero trust, we have to boil the ocean, throw everything out and start over. Take a look at the NIST draft zero trust architecture document published in February 2020 to understand what I’m talking about.

Although the document is is absolutely correct in how it organizes the zero-trust ideas and the technical things you have to have in place in order for it to work, NIST puts forward a proposed system of systems, an architecture of black boxes, that at first glance seems to be something none of us has, isn’t available from the commercial sector, and that’s too big to build ourselves. But this just isn’t true. You most likely already have the technical tools deployed in your networks that will allow you to get a long way down the path of the zero-trust journey.

Start the zero trust journey with the tech you already have deployed.

They are called next generation firewalls, and they became commercially available in 2007. All the major firewall vendor products do next-generation things and, if you’re a medium to large scale business, you probably already have a boat load of them deployed in your networks.

The firewall has been a staple of the generic security stack since the first commercial offerings back in the early 1990s. But when I say firewall, most of us are thinking about the old stateful inspection firewalls invented around that same time. These were basically fancy routers that allowed us to block incoming and outgoing traffic based on ports, protocols, and IP addresses, and we deployed them at the boundary between our digital organizations and the internet.

With a next-generation firewall, you block network traffic based on applications tied to the authenticated user. Let that sink in for a second. Instead of a layer-three firewall that operates on ports, protocols, and IP addresses, it-s a layer-seven firewall that operates on applications. If you’re concerned about your employees visiting Facebook during the workday, you could try to block their access at layer three by not allowing them access to a raft of IP addresses that Facebook manages and continuously changes. That’s a never-ending task, by the way. Or you could write a next generation firewall rule, a layer-seven firewall rule, that says the marketing department can go to Facebook, but nobody else can. Done. And you never have to touch it again.

In a next-generation firewall world, everything is an application. Using Salesforce? That’s an application. Have an internally deployed exchange server? Use of that is an application. Accessing the dev code library? That’s an application. Pinging a host in your network? That’s an application. Reading the Washington Post? That’s an application. Being able to block applications based on the employee groups that use them provides the infosec team a means to start down the zero trust journey without having to completely redesign their network. They may have to supplement it a bit, but they don’t have to start from scratch.

Logical segmentation and micro segmentation.

There are two approaches we can take: logical segmentation and micro segmentation. Logical segmentation is the relatively easier one. As an aside, I love it when people tell me that things will be easy. I had an old army boss who loved a latin phrase that he put on all plaques for departing soldiers: “Nihil Facile Est.” His translation: “Nothing is easy.” Words to live by.

Anyway, logical segmentation is creating layer-seven firewall rules for the big muscle movement functions in your company like marketing, legal, software development, etc. And this is where a lot of network defenders get tripped up. Since we create next-generation firewall rules by tying applications to authenticated users, it’s very tempting to create rules for individuals in the company. Kevin can go to Facebook but Luigi can’t. In any sizable organization, that quickly becomes a management nightmare. Trying to administer the inevitable change with individual employees moving around the organization over time will quickly cause your system to crumble of its own weight. Instead, focus on the ten to fifteen big functional areas. Create rules for what applications they can use and which ones they can’t, and you’re moved a long way along the zero trust journey. You still have to manage employee movement, but their access permissions are not specific to each employee. They’re based on a handful of important company functions.

The other more difficult approach is micro segmentation. This uses the same idea of building functional groups and writing rules for them, but it focuses on the devices used by those functional groups. The marketing team can access the internal cafeteria website from their iPhone to order lunch, but the group does not have access to the financial department’s M&A database server. The reason this is harder is that the infosec team has to do the additional work of installing some sort of public key infrastructure on every device in the organization that the next generation firewall can interrogate. For small- to medium-sized companies, this is probably a bridge too far. But for larger organizations, they most likely already have this deployed. They just need to decide to utilize it.

BeyondCorp as an alternative to next-gen firewalls.

BeyondCorp is a SaaS (Software as a Service) zero-trust product or, if you like, a SASE (Secure Access Service Edge Cloud Delivered) product from Google. It’s not a next-generation firewall. It represents a new approach to the zero trust strategy and micro segmentation. It’s not yet a complete solution, as it only secures access for your remote employees’ devices to approved web applications, but you can see the direction the service is going. Out of all the data islands where we store our company’s data— behind the perimeter, data centers, mobile devices, SaaS applications, and cloud deployments—BeyondCorps covers some of them. The roadmap is clear to cover them all though.

BeyondCorp came from Google’s internal response to a Chinese government cyber attack against their networks and other prominent U.S. companies back in 2010. McAfee named the attack Operation Aurora and it was significant in two important ways:

First, the announcement made the general public aware of the advanced persistent threat, or APT—a cyber adversary that didn’t pull a hit-and-run to steal credit cards and other valuable personal identifiable information (PII). These guys took their time. They burrowed in slowly and with stealth. This was an espionage operation, not a criminal operation. The intelligence community and serious network defenders were aware that these kinds of operations had been going on since the late 1990s, but this is the first time that the general public heard about it.
Second, it marked the first time a commercial entity, Google, went public with breach information. Before that milestone, no commercial company would ever admit that they had been breached in public for fear that the information would send its stock price spiraling down to the cellar. When Google admitted this, it kind of gave everybody permission to do so too. Since then, we’ve learned that a public admission of a breach, if communicated properly, will not tank the company. Today announcing breaches is commonplace whether we communicate them properly or not.

One interesting side note: once the dust from the Operation Aurora investigation had settled , we learned that there wasn’t just one Chinese government entity operating inside the Google network. There were three: the Chinese equivalents of the FBI, the Department of Defense, and the CIA all had a toehold inside Google’s network. And in a nod to government bureaucracies everywhere, they each didn’t know the other two were in there until Google went public with the information. If that’s not the classic case of the right hand not knowing what the left hand is doing, I’m your mother’s uncle. And even though the Aurora campaign demonstrated the Chinese government’s advanced capabilities in cyber espionage operations, it also demonstrated that the Chinese government was hampered with the same kind of debilitating information silos you can find in any government bureaucracy. To me, this somehow makes these adversaries less scary. They are not a group of Jason Bourne spies who never make mistakes. They’re just humans who are particularly good at their craft but also put their pants on, one leg at a time, like we all do.

In response to the Aurora attacks, Google engineers redesigned their internal security architecture by adopting the zero-trust strategy using micro segmentation. Ten years later, Google product managers took what they learned from that experience and built the BeyondCorp product.

Which brings us back to next-generation firewalls. If you want to cover all of your data islands right now with technology that you most likely already have, next- generation firewalls are your best bet. The simple approach is to use logical segmentation; rules based on applications tied to authenticated users. Next generation firewalls are really good at that. If you want to get fancy, add micro segmentation; rules based on authenticated users tied to the devices they are using. And keep an eye on BeyondCorp and the copycat SASE services that are likely to pop up in the next few years. That’s a promising idea.

Why do zero-trust initiatives fail?

Here’s the thing though. Zero trust initiatives do not fail because the technology to implement it doesn’t exist. Next generation firewalls have been around since 2007 and were designed to do this very thing. Zero trust initiatives fail because network defenders don’t install the proper people and process to manage it. At worst, some of us think that we can flip a switch and the system will manage by itself. Let me count how many times that strategy has worked in my lifetime. That would be zero.

At best, we use the two-guys-and-a-dog management approach. This team of crack IT management experts operate our routers, our security stack, our printers, and they get coffee for the CEO in the morning. Now we want them to manage the zero-trust strategy inside our next generation firewalls. They barely have time to check their email in the morning and now we add this task to their plate. That is a train wreck in the making. That just adds to the technical debt pile that we are already not addressing. And besides, deciding which employees get access to which company resources is not a decision we want sitting with the vaunted two-guys-and-a-dog team. That is a decision that should be addressed in policy at the senior levels of our organization.

If zero trust is the next first principle building block that we are going to install on our reduce-the-probability-of-a-material-cyber-event foundation, surely it is important enough to build a team to manage it. We need a team to create the processes for bringing new employees in and deciding which zero trust functional buckets they will belong to initially. The team will also decide how to change employee access when they move laterally within the organization to new jobs and new responsibilities. The team will further design the processes for when employees leave the organization by removing their access from the system. And finally, we’ll need an entirely different team focused on automating these procedures so that the team managing it doesn’t fat-finger the configuration changes and cause an Amazon S3 Bucket type error by leaving the digital windows and doors open for some bad guy to find.

Zero trust: key take-aways.

Zero trust is one of several first principle strategies that are essential in order to reduce the probability of material impact due to a cyber event.
Deploying and improving a zero trust design is a journey, not a destination.
A zero trust strategy assumes that your network is already compromised and tries to limit material damage if it turns out to be true.
It grants employees access to organizational resources on a need to know basis.
The two tactical approaches to zero trust deployment are logical segmentation and micro segmentation based on a handful of major organizational muscle groups like finance, marketing, etc.
Logical segmentation is restricting resource access by major organizational muscle groups based on applications tied to authenticated users.
Micro segmentation is restricting resource access by major organizational muscle groups based on applications tied to authenticated devices.
Initiatives fail not because the industry lacks the technology to implement but because network defender leadership doesn’t allocate sufficient people, automation, and policy resources to make them viable.
Zero trust deployments mitigate insider threats by reducing the attack surface against disgruntled employees and against cyber adversaries who, after stealing credentials, have the same resource permissions as a legitimate employee.

The journey toward zero-trust starts here.

Seven years after the fact, it’s easy for arm-chair network defenders to criticize the NSA for failing to install a zero trust network designed to reduce the impact of an Edward Snowden type insider threat attack. The startling truth is that most of us didn’t have that kind of network installed either. The sadder reality is that most of us still don’t have that kind of thing installed when we all know that we should.

At first glance, the prospect of converting our old M&M networks into zero trust networks appears daunting and expensive. But, instead of thinking of zero trust as a thing we have to do to finish, to put in our done pile, consider it a journey on the never ending path of improvement. As I said, there are probably a million things we can do on that zero trust journey. But there are things we can do right now with technology that we likey already own that will allow us to start closing those digital doors and windows. And even if we do leave one ajar by mistake, the data that the thief finds there will not significantly impact the organization. And that is why zero trust is the next building block we will install on our reduce-the-probability-of-material-impact-due-to-a-cyber-event strategy wall.