Data loss prevention: a first principle idea.

By Rick Howard

Aug 17, 2020

CSO Perspectives is a weekly column and podcast where Rick Howard discusses the ideas, strategies and technologies that senior cybersecurity executives wrestle with on a daily basis.

Data loss prevention: a first principle idea.

Listen to the podcast episode.

Note: This is the tenth essay in a planned series that discusses the development of a general purpose cybersecurity strategy using the concept of first principles to build a strong and robust infosec program. The other essays are listed here:

Mind the “data” gap.

If you have been following along with the series, you know that I believe the fundamental first principle of network defense is to reduce the probability of material impact to our organization from a cyber event. I have made the case in the series that, in order to do that, you need strategies that address who has access to your data (zero trust), who is trying to steal your data (intrusion kill chains), and how to survive if your data are stolen or destroyed (resilience). I also made the case that essential supporting strategies are automation (DevSecOps), assessing risk, using cyber threat intelligence, centralizing activity in security operations centers, and managing the organization's efforts in case of a crisis (incident response).

While having those discussions, I have hinted around the idea that your organization’s material data—the information whose theft or destruction would have “a major impact on the financial, economic, reputational, and legal aspects of a company, as well as on the system of internal and external stakeholders of that company”—is the key to success. It’s time to talk specifically about the information itself.

The first concept to consider is that not all data within your organization are material. Let’s not waste time and resources worrying about anything that won’t affect our first principle, atomic vision of what we’re trying to do. In fact, I would suggest that most of the data flowing into, out of, and through your networks isn't material. Material data depends on your industry’s compliance regulations, the crown jewels in your company’s intellectual property, information on mergers and acquisitions, and a thousand other things, depending on what your organization does and what your leadership thinks is important. My big SWAG (Swinging Wild A*! Guess) is that, for most organizations, the material information you need to worry about is about 20% of the total. Your results may vary, but the point is that understanding the relatively small size of the material data set compared to the complete data set will better enable practical solutions to protect it.

The second concept to consider is that, even if it’s relatively small in size, it’s also likely to be scattered across multiple data islands:

Behind the traditional perimeter (i.e., servers, computers, phones, and pads located behind the traditional firewall at the office).
On employee personal and professional mobile devices (laptops, phones, pads, etc.) in the field and at home.
In private data centers.
In an IaaS or PaaS cloud provider’s network, probably both IaaS and PaaS, and probably across multiple providers.
In SaaS applications that you officially sanction, unofficially tolerate, and absolutely forbid.
On physical media like paper, USB drives, and portable hard drives.

Applying first-principle thinking across all the data islands is really hard to do. But if you were to just concentrate on the material data itself, is there something else you could do that would help us prevent material impact to the organization, that’s in fact, not just nice to have but fundamental to our overall plan? I’m glad you asked. It’s called data loss protection or data loss prevention, and the network defender community is really bad at it.

Frameworks for data loss prevention.

As I normally do when I begin to learn about a subject that I’m supposed to already have become an expert in, I turn to NIST (United States National Institute of Standards and Technology). As near as I can figure, NIST and the United States government have a different name for data loss protection. They call it “controlled unclassified information” or CUI. Their official pub for it is 800-171, Revision 2, and they published it in February 2020.

Why is it called 800-171? Besides governments love incomprehensible acronyms for any kind of program or common phrase—BOHICA, FIGMO, FUBAR, HMFIC, MOAB, NUB, and PFM are some of my favorites and whose colorful definitions I leave as an exercise for the reader to discover—they also love labeling their documents with impenetrable numbering schemes that are only valuable to maybe the authors of the documents and one file clerk in the basement of the Office of Management and Budget (OMB). But I digress.

Their title for the over one hundred pages of 800-171, Revision 2 is “Protecting Controlled Unclassified Information in Nonfederal Systems and Organizations.” Amazingly, they decided they needed a 72-page supplement to it and published a draft in July 2020. It’s called “Enhanced Security Requirements for Protecting Controlled Unclassified Information.” The authors make the text more complicated than it needs to be, but they essentially recommend the same things I have been recommending in this first principle essay series (see above). They don’t specifically call out DevSecOps, but they hint at it when they recommend that organizations periodically refresh or upgrade “organizational systems and system components to a known state or [when] developing new systems or components.” And they bury risk assessment in a laundry list of provisions they call “security requirement families.” But the basic themes are the same. The point is that, for those NIST recommendations that align with your first-principle security program, you don’t need a specific data loss program. You’re already covered.

The NIST documents do talk about things that our first principle model hasn’t spelled out yet and, thus, would probably constitute a separate data loss prevention program:

Change management: A plan to organize technical changes on systems that host CUI or material data.
Off-island control: Protection for paper, removable digital media, and electronic versions as they move out of controlled areas.
Destruction: The eradication of paper, removable digital media, and electronic versions when they are no longer needed.
Labeling: The marking of all CUI or material data written on paper, stored on removable digital media, and electronic versions.
Encryption: Encoding CUI or material data stored at rest on removable digital media and electronic versions or as it transits.

All these strategies have been around the network defender world for a long time. In most organizations, though, they don’t get close to implementing all of them. There’s not enough bang for your buck, at least in the commercial and academic worlds. In the government space, destruction, off-island control, and change management take on a different significance when the information is classified at the highest level. If I were to prioritize for the rest of us, however, encryption would be the thing I tackled first. That has the greatest chance of significantly reducing the probability that you’ll be materially impacted in the future.

On the commercial framework side, Forrester developed something they call their data control framework. It matches the NIST framework in terms of data classification (labeling), access control (zero trust), and data deletion. They have additional components that NIST doesn’t include like:

data discovery
data intelligence
security data analytics
data inspection

I think if you build a robust zero trust program, you already get most of that. What both the NIST documents and the Forrester framework add that I haven’t seen in any framework document before is the idea of deception or data obfuscation. That’s an interesting development.

Honeypots: deception networks.

Deception networks aren't a new idea. Some people call them “honeypots.” Caleb Townsend of Cybersecurity Magazine claims that the first honeypot was built by Clifford Stoll, and made famous in The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage. I was running a version of the concept in the Army as early as 2002. The idea is that you build a completely fictitious network that, from an invader’s point of view, looks exactly like a portion of your real network. This includes routers, firewalls, email servers, domain name servers, web servers, and host names. You get the idea. Once built, you fill that network with all kinds of phony documents, enticing documents, documents that might tempt an intruder. When I was doing it in the Army, we took the extra step of inserting a beacon into the fake documents designed to phone home if some intruder ever opened them.

The bad news is that operating deception networks is resource intensive. You need people to design them and deploy them, and they aren't free. You need another set of people to build the fake documents, and this is no small matter. The documents have to be credible and alluring and not filled with gibberish. The worst part is that you have to be ready to burn it all down at any moment once the hackers discover the ploy, and rebuild it somewhere else with new network names, server names, and new fake documents. You also have to understand that building successful deception networks is at least 50% technical savvy and 50% art. The people who do this kind of work are a rare mix of electronic hacker, network engineer, social engineer, and english major. If you look at the members of the infamous hacktivist group, the Cult of the Dead Cow, you will have some idea of the personality and skills required to pull this off.

The good news is that, if you ever detect activity in your deception network, you absolutely know it’s a bad guy because no other employee will be in there. You can use the telemetry gained in the deception network to ensure the adversary isn't in your real network. Also, if a bad-guy team is trying to penetrate your deception network, they are wasting resources on something that has no value to you and to them and not directing those resources at your real network. Finally, if you’re good at creating fake documents, you can insert a degree of uncertainty into the bad guy’s operation. If they know that you’re running deception networks and your Cult of the Dead Cow team is good at presenting half-truths and downright lies, the adversary will question the validity of anything that they’ve stolen from you.

Since I was doing it in the Army back in the early 2000s, the technology piece has gotten way easier to manage. There are security vendors who do this for you now, vendors like Acalvio, Attivo Networks, Cymmetria, Illusive Networks, Smokescreen, and TrapX Security. I haven’t used any of them, so I can’t vouch for them, but the fact that NIST and Forester have included deception as a key component of data loss prevention makes me think all of us might start having these kinds of services running in our environments in the near future.

Your data loss prevention program.

If I were already down the path of building my infosec program along the lines of first-principle thinking, adding a data loss prevention strategy is probably the next thing to tackle. I would absolutely prioritize identifying your CUI or material data and architect a way to encrypt it at rest as it resides on all of your data islands and also as it transits from island to island and away from your control. The next thing to consider is whether you have the resources to manage a deception network. It might be too early at this stage for small- to medium-sized businesses, but with NIST and Forrester backing the concept, deception networks will probably be on all of our radars in the next few years. For large enterprises and governments, you should probably be looking at pilot projects this year.

Data loss prevention: a first principle idea.

Mind the “data” gap.

Frameworks for data loss prevention.

Honeypots: deception networks.

Your data loss prevention program.

Recommended reading.