Orchestration as a cybersecurity first principle.

Aug 16, 2021

CSO Perspectives is a weekly column and podcast where Rick Howard discusses the ideas, strategies and technologies that senior cybersecurity executives wrestle with on a daily basis.

Orchestration as a cybersecurity first principle.

Listen to the audio version of this story.

In March of 2003, I was the Commander of the Army Computer Emergency Response Team or ACERT. The internet was just really taking off then. Wikipedia had just launched a couple of years before. Apple launched iTunes that year but we were still four years away from seeing our first iPhone. In the military, we were still trying to figure out what cyber operations meant and every organization that could spell cyber correctly, three times out of five, thought that they should own it.

One of my ACERT responsibilities was to coordinate offensive and defensive cyber operations for all of the Army cyber stakeholders (intelligence, networking, law enforcement, legal, information operations, and many others) with our sisters and brothers in the joint world (Air Force, Navy, and the Marines). These were the Title 10 forces, as they say, and my job was to make sure that whatever they were doing didn’t step all over what the Title 50 cyber forces at the National Security Agency (NSA) and the Central Intelligence Agency (CIA) were doing.

Title 10 and Title 50 refer to the chapters in the United States Code that provide, among other things, the laws governing the Armed Forces and their use (Title 10) and things like spying, covert operations and espionage (Title 50). Many people probably don’t know that spying and espionage (Title 50) are things primarily reserved to the American spy organizations. Title 10 forces mainly fight the nation's wars. There are some exceptions to the above, but on the whole this is the general division of labor. In theory, what this means is that the Army doesn’t do espionage missions unless it’s working directly for the NSA, and the intelligence community doesn’t fight wars unless they are directly supporting the military.

I mention all of this because, during this time (early 2003), the United States and some of its allies were about ready to launch the invasion of Iraq. In preparation for that event, the Army’s cyber stakeholders realized that we were caught flat-footed. Previously, we had divided operational control of the Army’s cyber assets into various regional CERTS (RCERTS): North America, South America, Europe, Pacific, and South Korea. But we had no presence in South West Asia (SWA). Doh! And we needed one. So we built one lickety split, recalled a bunch of reservists to man it, and shipped them all out to the sandbox in time to support the invasion.

Immediately, the RCERT team noticed several continuous, low and slow probes of the RCERT SWA electronic perimeter coming from multiple locations and countries in the Middle east. That couldn’t be good. We began to worry that whomever those bad guys were might be gearing up to degrade or dismantle this fledgling network designed to support the tanks and the infantry when they crossed the line of departure on H-Hour. We needed a plan to counter that contingency.

We basically went into stealth mode.

We orchestrated a plan across all Title 10 interested parties where, at the push of a button, we switched the entire RCERT SWA infrastructure to new domains and IP addresses. Essentially, when H-Hour arrived, the RCERT SWA infrastructure went dark from the perspective of any outside entity trying to keep tabs on us. Internally, we were fully functional, but to the outside world, RCERT SWA disappeared off the board just like a Klingon Bird of Prey using its cloaking device. It didn’t last long, maybe a day, and we knew that going in. Our goal was to cause confusion and disorientation to whomever might want to cause the Army harm at the beginning of the war.

I love that story because it highlights a capability that all network defender organizations need and most don’t have: orchestrating the security stack. In other words, deploying the policy and strategy to the operational equipment on the ground in real time.

Why do we need orchestration?

I’ve mentioned this past history in previous essays, but in the early internet days (the late 1990s), orchestration wasn’t a problem. We only had three tools in the security stack: firewalls, intrusion detection systems, and anti-virus systems. When we wanted to make a change to the policy, we manually logged into each tool and made the change. Fast forward to 2021, and our environments have morphed into enormously complex systems of systems deployed across multiple data islands (hybrid-cloud, SaaS, internal data centers, and mobile devices). Orchestrating the security stack for our first principle strategies (zero trust, intrusion kill chain prevention, resilience, and risk forecasting) across all those data islands in some consistent manner with velocity is exponentially hard to do compared to the early days. Truth be told, most of us don’t do it very well.

How do we do orchestration?

DevOps and DevSecOps

There are a number of approaches security practitioners can take to ease this burden. One is DevOps or DevSecOps. Back in 2003, when Google was still nothing but a search engine, they decided to give the task of network management to the developers. The industry didn’t get the DevOps name for what they were doing until 2010, but Google pioneered this concept of infrastructure-as-code. Instead of technicians manually logging into network devices to update configurations, Google’s Site Reliability Engineers automated the low level tasks, or “toil,” as they call it. Twenty years later, they are one of a handful of internet giants that dominate electronic commerce. By the way, the others (Netflix, Microsoft, Amazon, etc) also adopted this DevOps model early. Perhaps there is a repeatable pattern there. What do I know? I'm just a lowly CISO.

Innovative startup companies, the ones that came up with the DevOps name in 2010, realized that the way they could distinguish themselves in the marketplace was to deliver their services from a SaaS model using infrastructure-as-code. Two Cybersecurity Canon Hall of Fame books talk about this history and how to think about this philosophy: Site Reliability Engineering from the team at Google and The Phoenix Project by Gene Kim. With this approach, as part of the app development process, practitioners build into the system the way to manage the security stack at scale and velocity. Checkout Netflix’s Chaos Monkey research if you want to get a lesson on how to think about about a hardcore resiliency strategy that is powered by DevSecOps.

Orchestration platforms

A second approach is to deploy a commercial tool that does the bulk of the work for you. Security pundits, like Jon Oltsik (the principal analyst at Enterprise Strategy Group), started talking about this concept as early as 2015. They were describing the need for the security industry to develop services that automated the collection of security tool telemetry, made policy decisions based on that telemetry, and deployed new and updated policies back to the security stack. All-in-one orchestration platforms started appearing in the market a couple of years later from the big firewall vendors like Checkpoint, Cisco, Fortinet, Juniper, and Palo Alto Networks. These platforms still did traditional firewall-type things but they also started adding subscription service add-ons to help with zero trust, intrusion kill chain prevention, and resiliency. Instead of the practitioner managing the integration of multiple stand-alone security tools (anything from five to three hundred, depending on an organization’s size), they deployed one orchestration platform in various form factors to each data island. The platform performed many of the same tasks as the individual tools but it was all controlled under one coherent platform policy. Where it was possible, each subscription service integrated with the others automatically. The downside was that this collection of services probably didn’t represent the best of breed for any particular security tool category. The upside was that they were likely good enough, had the added benefit of being fully integrated with other subscription services where possible, and were automatically updated with the latest prevention controls discovered by the vendor. Since these firewall vendors had multiple customers scattered around the world, they saw a lot of bad guy telemetry in real time. If they developed new prevention controls because of something they saw in customer A’s network, all of their customers benefited from that process.

SOAR

But the idea that you could trust one single vendor to do the bulk of the security work was a tough sell. Most security practitioners wanted to hedge their bets with multiple vendors. The platforms were expensive too. Small and medium sized organizations couldn’t afford them. These same small and medium sized companies were likely not doing DevOps either despite the disruptive success of startup companies in the early twenty-teens. Which brings us to a third hybrid approach: SOAR, or Security Orchestration Automation and Response.

Gartner coined the term in 2017 about a new kind of Security Operations Center (SOC) tool that, in general terms, knew how to communicate with every device in the security stack and provided basic automation capability to handle repetitive data patterns. For example, if a newbie SOC analyst swipes left on the same intrusion detection system alert a thousand times during her shift, the SOAR tool facilitates the automation of that swipe. The automation piece made SOAR tools unique compared to SIEM (Security Information and Event Management) tools that just collected the telemetry for the most part. But I expect at some point that these two capabilities will start to merge. SOAR companies will add SIEM functionality and SIEM tools will add SOAR functions.

SOAR tools are pretty great at reducing the noise inside the SOC. At my last CSO gig, we went from 1 billion alerts coming into the SOC every quarter that a human had to look at down to just 500. That’s amazing. If SOC analysts just did that, their life would be so much easier. But there is this untapped capability with SOAR/SIEM platforms. We don’t have to be in the one-way receive mode. They already know how to talk to all of the devices in the security stack. What if we used these tools as our DevOps bridge? We could build zero trust, intrusion kill chain prevention, resiliency, and risk forecasting frameworks within the SOAR tools that might be able to give us push button capability to update our security stack. But I haven’t seen anybody doing that in the real world.

SASE

One last relatively new option is to use a SASE vendor. SASE stands for Secure Access Service Edge and flips the infosec security architecture model on its head by using a cloud provider as the first hop destination for any network traffic leaving the local site. Local sites could be headquarters buildings, sales offices, data centers, cloud workloads, and remote employees working from home or at the local Starbucks. Gartner coined the term SASE back in 2019 and defined three elements that would distinguish a SASE vendor from say a standard MSSP (Managed Security Service Provider).

Security Stack: In a shared responsibility model, the SASE vendor keeps the blinky lights working on whatever security stack tools they provide. The customer sets the policy. The range of options for the security stack are wide. Buyer beware. If you’re doing this, make sure that the SASE vendor’s security stack can handle all of the first principle strategies that we have been talking about.
SDWAN: The SASE vendor plugs into your SDWAN metalayer to ensure that all traffic goes through the security stack and routing is as efficient as it can be. That’s the good news. The bad news is that you have to have an SDWAN meta layer. I’m not saying that SDWAN is bad. I’m just saying that it’s another element in your security stack that adds complexity.
Peering. The only way this SASE model works is if it doesn’t slow down normal internet traffic. If your SASE vendor only has a handful of cloud locations around the world, that could impose a serious bandwidth limitation if all of your traffic has to go through those nodes. The fix for that is for your SASE vendor to establish peering connections in their data centers with some of the big content provider networks like Google, Amazon, and Netflix. For example, your employees in Singapore could ride the vast fiber network of Google to get to the SASE vendor’s security stack. When you are talking to SASE vendors, make them describe their peering connection roadmap.

SASE is essentially a modified version of using a single vendor’s orchestration platform. The good news is that this model is even less complex than deploying and maintaining the orchestration platform yourself on all of your data islands. The bad news is that it's not clear how expensive these SASE services will be in the future. As they say, we are in the first innings of this ballgame. But, I'm assuming they will reach some economies of scale as their customer base grows and that may lead to prices falling.

Of the four options, using a SASE vendor is probably the easiest in terms of complexity, followed closely by deploying a single orchestration platform. But today, both tend to be more expensive. If the SASE vendors can keep the costs down, I think that the SASE architecture is the future especially for small and medium sized organizations. Adopting a DevSecOps mentality is probably the right way to go if your organization is trying to be the next internet giant in the wake of the Googles, the Netflixes, and the Amazons. But if you are just starting that now, you are years away from having something useful. I expect that most organizations are in the middle somewhere with the SOAR/SIEM model, but they most likely are only using it as a SOC noise reducer, and not as an orchestration platform.

Where does orchestration sit on our infosec BBQ pit?

Here’s the thing. If you’re on the same wavelength as me in terms of infosec first principles, then you have bought into the idea that the most important thing that we are trying to do is reduce the probability of material impact to our organization in the next few years due to some cyber event. That’s the foundation.

In order to do that though, we have to pursue four strategies in parallel: zero trust, intrusion kill chain prevention, resilience, and risk forecasting. Those BBQ bricks sit on top of the foundation and give it strength. But what comes next?

Intelligence Operations?
SOC Operations?
Incident Response?
Data Loss Protection?
Identity Management
Purple Team Operations?

I want to make an argument here that security orchestration is so important that it should be a big slab of stone that straddles the four parallel strategies before we add any of these other bricks on top. My reasoning is that for every other brick you add to the pit, you’re adding complexity. The more complexity you have and the more manual your operation is, the more chances you have of leaving something open, or misconfiguring something that will completely unravel all of the work you’ve done to establish each individual brick. Orchestration has to be a key part to your infosec program, not something that you do eventually or do only in certain areas.

And that’s why we frame problems in terms of first principles. You identify the atomic thing that you are trying to do and then create the most important principles that you will need to support it. When I started this series, the concept of orchestration was mentioned in all kinds of places (see the bullet list above). It’s obvious to me now, that orchestration has to be the thing that we are all good at and that binds the entire program together.

Reading list.

S1E1: 6 APR 2020: Your Security Stack is Moving: SASE is Coming.

S1E6: 11 MAY: Cybersecurity First Principles

S1E8: 26 MAY: Cybersecurity first principles: intrusion kill chains.

S1E9: 01 JUN: Cybersecurity first principles - resilience

S1E10: 08 JUN: Cybersecurity first principles - DevSecOps

S2E1: 20 JUL: Security operations centers: a first principle idea.

S3E5: 16 NOV: SOAR: a first principle idea.

S4E9: 08 MAR: Cloud Third Party Platform security via first principles

References.

‌“2003: Second Gulf War (Iraq War).” by Yves Messer, Making History Relevant, 23 June 23 2012.

“Chaos Monkey – Netflix TechBlog.” Netflix TechBlog, 26 July 2017.

‌“Cybersecurity Canon.” Osu.edu. 2014.

‌“Demystifying the Title 10-Title 50 Debate: Distinguishing Military Operations, Intelligence Activities & Covert Action.” by ‌wpengine, Harvard National Security Journal, 2 December 2011.

“DESERT SHIELD AND DESERT STORM: A CHRONOLOGY AND TROOP LIST FOR THE 1990-1991 PERSIAN GULF CRISIS,” by Lieutenant Colonel Joseph P. Englehardt, Director, Middle East Studies, Department of National Security and Strategy, U.S. Army War College, Mar 1991.

“DevOps Case Study: Netflix and the Chaos Monkey.” SEI Blog, 20 April 2015.

“Fact Sheet U.S.C. Title 10, Title 22, and Title 50,” by American Security Project on Aug 09, 2012

“Iran-Iraq War.” by History.com Editors. HISTORY, 13 July 2021.

“Operation DESERT STORM | U.S. Army Center of Military History.” 2021. Army.mil. 2021.

“Site Reliability Engineering: How Google Runs Production Systems,” by Betsy Beyer (Editor), Chris Jones (Editor), Jennifer Petoff (Editor), Niall Richard Murphy (Editor), Published by O'Reilly Media, 2016.

“The 50 Most Significant Moments of Internet History.” by Nate Lanxon, CNET, 22 January 2010.

“The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win,”by Gene Kim, Kevin Behr, George Spafford, Published by IT Revolution Press, 2013.

“War in Iraq Begins.” by History.com Editors, HISTORY, 24 November 2009.

“What Do ‘Swipe Left’ and ‘Swipe Right’ Mean?” by Vann Vicente, How-to Geek, 2021.