What is data centric security and why should anyone care?
[ Music ]
Rick Howard: Hey, everyone. Welcome to CyberWire-X, a series of specials where we highlight important security topics affecting security professionals worldwide. I'm Rick Howard, N2K's Chief Security Officer, and the CyberWire's Chief Analyst and Senior Fellow. And today, Dave Bittner, the Senior Producer and host of many of the CyberWire's podcasts, will be joining me at the CyberWire's hash table to discuss data-centric security. After the break, you'll first hear my conversation with Bill Newhouse, an engineer at the National Cybersecurity Center for Excellence, and then Dave will talk with Dana Morris, Senior Vice President of Product and Engineering at Virtru. Come right back.
[ Music ]
The idea of zero trust has been around since the early 2000s. John Kindervag formalized the idea when he published the original founding whitepaper in 2010 called "No More Chewy Centers - Introducing The Zero Trust Model of Information Security." Since then, security professionals and security vendors have been trying to get their hands around the idea. But as the info sec community has evolved the philosophy, the actual practical how-to tactics have been a moving target. The original idea was to limit access to resources on a need to know basis. In the early days, we concentrated on limiting access to people based on their role in the organization. Then we realized that we needed to think about devices, too, like phones, tablets, servers, and by extension, cloud workloads. Then we realized that we needed to limit access to our software applications that we buy and install commercially and the apps that we build ourselves. Not to mention the APIs that come with all of that. All of those potential zero trust controls are tactics that we might deploy to our internal digital infrastructure. Those data islands where we store our essential information and workloads. But the next component that has emerged in recent years though is how do you apply the same zero trust philosophy to data that exists outside of your digital infrastructure? Like email you send outside to partners and contractors, or files that you store and share in public repositories like Dropbox or Amazon S3 buckets? The US National Cybersecurity Center for Excellence in CCOE has started calling this data-centric defense. And they have a new research project to figure out what that means. So, I reached out to the NCCoE to help us understand this new idea.
Bill Newhouse: Yeah, I'm Bill Newhouse. My title is Cybersecurity Engineer. And I work at the National Institute of Standards and Technology, in particular I work at our Applied Cybersecurity Center, which has the fancy name National Cybersecurity Center of Excellence.
Rick Howard: I started out by asking Bill to describe what data-centric security is.
Bill Newhouse: Anything centric means you're focused on it as an important thing to worry about. And when I was invited to help co-lead this project, it made sense to me to talk about data. DHS, when they describe zero trust, and I haven't totally read the DoD paper, I carried it around a bit. Data, if you think, you have a chewy center and you're moving data around for your business processes, you're relying on data, you are a data company, data supports everything you do, it starts to sound like the thing to really worry about and protect. So this project's data classification that I've walked into references that zero trust has a data element, it's either called a pillar or you have tenants of good things you should know in knowing about your data. Because those decisions about who to authenticate to have access to it, and where they go, and how the data moves around within a realizable zero trust architecture, that's- you do all this because you have data you need to process. And it's kind of Captain Obvious stuff. And I think if we circle around it, I'll probably say it in better ways. But it is the thing. And zero trust I frame as everything we've always wanted to do trying to be sold to you in one happy package. And recognizing that that's kind of really tricky. Because we don't get to throw away all the old stuff we're using. And it doesn't necessarily immediately walk in and play nice with zero trust. So if you're lucky enough to be a new start-up and you're creating stuff, you can probably achieve zero trust faster. In either model, knowing where your data is and what data's important to you and how you need to protect it, and there's been different pushes on why you need to protect data offered to us in the last decade, those are all very important. So data centric to me is really just trying to get your hands on what's important for your business and figuring out what to protect. And then classification is an early step in that process.
Rick Howard: Well the way I look at it is we first started thinking about zero trust over a decade ago. The first thing we thought about was identifying and authorizing individuals. People. About what they have access to. And then as it became more and more acceptable for users to use their own personal devices to do work, like their iPhones and laptops and things, and then also we knew we had a collection of servers all around the place before we went to the cloud. So we had people on devices. And as we moved into 2020s, now we're looking at software, being able to say what can the software actually touch that we're running? That's stuff we write ourselves and stuff that we buy. And obviously, this example of that is solar winds attacks. Those kinds of things become more and more important. And now we're going to throw APIs onto the pile because we're all moving to APIs to control everything that's going on in our networks. And then that all sounds hard enough. But then there's this last use case that I think this data centric model really addresses. Is that when we want to share data, just like a file or a set of files or a bunch of data records, outside the organization, it's not protected by all those other zero trust rules. Let's say it's sitting in Dropbox somewhere. We want to be able to put some sort of zero trust rule set on this data glob, and still feel like we've got a robust zero trust appointment.
Bill Newhouse: And you're describing the use case that our data classification project aims to hit. In short- it's not shorter terms. What am I- islands of zero trust are wonderful. And if your own organization develops zero trust and you've started to realize the benefits in all these different ways that we've touched on and some that we haven't, great. Your own house is nice. So if people come to visit, your closets are organized, they know where to find the silverware, they know what's in the fridge. And everything's like really nicely organized. And it just looks like oh, Martha Stewart lives here. And that's good. Sorry, that analogy, metaphor thing.
Rick Howard: Let's throw one more metaphor on.
Bill Newhouse: Yeah, but you have real business as frenemies. And people you need to work with. And transactions that all need to occur. And so you could try to work out a system by which, yes, my data is I hand it to you. It's absorbed and I've labeled it and done all the mechanisms to do something that allows you to promise me at some level that you will care for this baby the way that I-- oh no, another one. You will care for this data the way I cared for it. It matters to me that I prove that I'm protecting the privacy of my customers and you should, too. That's still going to be difficult, but we aim to show, at least - I'll say at least because you know, we're going to do this with real stuff, we're going to run some stuff and put data past it - to show that my markings and my schema and what I did with, you know, to-do classification as I give it to you offers you the advantage to take it and absorb it and use it quickly. And put it into the same protections if your regulators or your just personal preferences or whatever, your values, you need to meet them. We have to negotiate that. But this is also an opportunity to put some technology on this so that is potentially more realizable. I'm trying to find some notes from a conversation I had and I'm not going to find it. It was- I'm close in my book, but my notes are never as good as I want. But it's sort of information release versus information safeguard.
Rick Howard: If you guys are successful, I can extend my zero trust policy outside my organization to partners and collaborators and people that just need to see the data. I still have to manage the profiles of who gets access and who can do what with it. Before, that wasn't even possible unless they were inside the network. Now I can extend it out.
Bill Newhouse: That's what we aim to show. And some of it is for people who've never even thought I need to organize this stuff to get ready for that. And then once you're closer to being ready for that, then sort of, then you dive a little farther and you can start to- I'm spit-balling here because I think I told you, I've been with the government ever since I was 19 years old as a co-op student. Solving these kind of problems matter to parts of government, but for industry, for contracts and other negotiations and entitlements and things, magic words that I'm learning as we- we invite folks to join us and the technology providers we invite, they have customers and clients. So, you know, they won't tell us individually about those relationships, but they bring that experience to the build and we try to come up with use cases that would let us illuminate all the things you and I just talked about. In what we hope to be- well, functional. So if somebody says they did it and they documented how they did it, they being us at the NCCoE, you know, we've accomplished it and we told you what we want to do and we proved to you that we measured that it happened. We don't necessarily try to pen test the systems that we build. The functional, I call them reference designs, I don't want to use words to reference architecture because that's a loaded term of this is the only way. But reference design is to say we rationalized with our collaborators that this is something we could create. And look, it does the good things we want it to do. And we leave a little bit of scoping to say we're not going to solve all those other problems and zero trust leaves a lot of room for more work. We do have colleagues here at the center focused on zero trust. And as you described, they are focusing on authentication. They are focusing on network segmentation early on in that project. Eventually, we're going to talk about the data and how the relationships between what they've already accomplished and what we want to accomplish with data, you know, the policies that would be involved in data handling and storage will grow into the conversation with the zero trust team. So, a lot of possibilities. And if we can do it and we plan to, we believe that, you know, it'll help with adoption. And our goal is to see people adopt better and useful and hopefully even somewhat measurable? That's trickier. But you know, proof that cybersecurity can be advanced through some of these new things we're talking about. That have standards, NIST has a special pub on zero trust. DHS has its architecture structure. Strategy, excuse me, and so does DoD. Those are all models of moving towards better practices that will hopefully keep us from having- the bigger impacts that solar winds has on organization. Any ransomware attack, you know, you start to have a better understanding of your data. You can prepare yourself in good ways for things that are happening now and hopefully things that people will aim to do in the future against us.
Rick Howard: So this year, you guys published a draft executive summary of what you're trying to do. The paper's called "Implementing Data Classification Practices." What's the goal there? What was- what are you trying to do with that paper?
Bill Newhouse: Yeah, each one of the practice guides that come out of the center, they're NIST special pub 1800 series. So this one's 1839-A, which implies that we have 38 others of these in our library. There's an executive summary, volume A. There's eventually we'll publish a volume B, which does a little bit more of in the area of analogy. It's the recipe and the ingredients. And what we want to accomplish described for the values of what security and privacy risk you can reduce. It often has mappings to things like the cybersecurity framework or security controls. So it gives lots of different angles for I care about these recipe they're about to cook, and then we do a volume C which would be all the 1s and 0s and yeses and no's in the setup of the hardwares and softwares. And any APIs that we're using. And any glue code we need to write to make those work. That's the details. So so far, what you talked about is volume A. It's preliminary draft. And it's our new-ish tactic of getting in front of people with one more hey, look what we're doing. Please pay attention to us. Stay tuned for more. Look who we're working with. The good and the bad of that is more people say I want to be a collaborator. And-
Rick Howard: I was going to say. You're not looking for volunteers, right? You're just- this is an announcement that you're doing the work.
Bill Newhouse: We asked for the volunteers starting about 18 months ago. And that process, I never want to say it's complete because we can find ways to add collaborators if we feel that we need more technology or some type of expertise that we don't already have. And we can be convinced of that. So people are coming to us with I want to play, I want to be a collaborator. And that process we described 18 months ago with the federal register notice, stuff on our own website, if you go to nccoe.gov, nist.gov. Nccoe.nist.gov. There's a button you can click to see which projects are in which phase. And that bringing collaborators into the project is an earlier phase that we already accomplished. So yeah, and the purpose of the document is really just to ask you to pay attention. Check to see if you like the language selections we've made. One thing we do believe in this data classification space is that there's a lot of language that people use. And we're trying to figure out which are the ones that stick as being good, solid, I know what that means. And I'm already guilty of it during this podcast, labeling, tagging. I haven't said the word metadata. You know, but what are the- and there's a lot of other words that people will throw into this space.
Rick Howard: Zero trust is just one. So we'll just throw that one in there.
Bill Newhouse: Yeah, at least that one has- you know, you did a great job describing it. And I offered my belief that it's everything we always wanted to do. And eventually we'll get to it. If the world continues in a positive way. It's realizable. But that's proof is in the pudding. And so here, we're asking people, you know, look what we're doing. We don't expressly say hey, do you like it? Except you could offer comments to say you don't. We do tell you that you can join us if you find this pub and you're not already part of our community of interest, which is a bigger layer of people who already know that we're doing this project and want to keep track of what we're doing. So they get notices if you're in the community of interest that, you know, we're having a meeting with the public. And we haven't really established one lately. But we've done our own internal educational webinar on this. We've had conversations to encourage people who wanted to be collaborators to ask us questions back in that earlier phase. So, the guide offers places for people to join the community of interest. And just stay tuned, as I said a couple times. It's- put a little tag on us to let us know that we're doing this. And let us know that you're interested. And we already have heard a lot. We published this quietly during the week of RSA and then made an announcement to our community of interest just recently. And this podcast will certainly go out to a large community. You asked earlier about the center and its foundation. I think our gov delivery email Listserv has over 40,000 people in it now? Which is astounding to me. But that's good. It means you and I chatted a little bit about workforce and there's often statistics to say there aren't enough people doing cybersecurity. Well, we've at least found 40,000 hopefully real people who care somewhat about some aspect of what we're doing. And that feels good. That's at least one measurement that just comes out as a pretty solid one. So, yeah.
Rick Howard: So Bill, we're at the end of this. What's the headline here that you'd like to tell everybody about this project? That we need everybody from the CyberWire to know.
Bill Newhouse: You told me to think about that question. It is that one should be- one should organize one's data so that you can have it work for you, you can protect it, you can share it as you wish to share it. And aim to have control of that process so that you're able to meet whatever regulations your industry requires. And that you can make promises to your customers or yourself about that the stewardship of data is important to you and data classification is a vital step.
[ Music ]
Rick Howard: Next up is Dave Bittner's conversation with Dana Morris, Senior Vice President of Product and Engineering at Virtru, our show's sponsor.
[ Music ]
Dana Morris: If you think about just the history of Virtru, it sort of started in 2000- like 2008, probably. Where our co-founder, Will Ackerly, invented something called the trust in data format. Which was an attempt to standardize on a specification for how to classify and tag data. And use those tags to enforce assertions and obligations against data wherever it goes. That is a specification that is maintained by the Office of the Director of International Intelligence, and it's the format upon which Virtru is built. And I've only seen recently, especially with the growth of zero trust, is this idea of really focusing on what is that core asset you really want to trust when you think about- or protect when you think about security. And that's the data. It's not the perimeter, it's not the network. All of the solutions you're employing in the context of security are really about protecting data. And so the concept here is about how to start- for doing more to protect the data object itself in addition to all the other ways that you're protecting data already. And I think that's been an interesting trend in the last couple years. And we've seen, you know, kind of industry and government momentum towards starting to think about that problem space and starting to standardize and agree on a way to approach the problem.
Dave Bittner: Before we dig into some of the specifics, is it fair to say that, certainly as we've been through COVID, that the notion of protecting the perimeter, having the kind of virtual moat around your castle, that's fallen out of favor. By necessity.
Dana Morris: Yeah, I think if you think about cloud, I mean, what is the perimeter? There really isn't a perimeter like there used to be 10, 15 years ago. And everybody had firewalls and VPNs and they were basically- you were connecting into a data center. I used to work at IBM. We had centralized servers and we'd be connecting into those remotely. And so there was a pretty well defined perimeter that could be used to control, you know, what people could and couldn't access. But as we moved to the cloud and data has been increasingly moved up into SAS applications and across different cloud solutions providers, the perimeter has definitely changed dramatically. I think one thing I would say is, I don't know that there is a move away from the perimeter, as much as saying we need to do more than just think about a perimeter. Because that perimeter's changed. It's not that you would throw out any concept of trying to enforce things at the app or the network boundary, but it's about adding onto those locks and essentially figuring out ways that you can put additional locks on the actual data itself.
Dave Bittner: Well, can we dig into some of the specifics here about data centric security? I mean from a user point of view, how does it work?
Dana Morris: Yeah. So, it really comes down to starting with data classification. And in some ways, it's actually a really nice user experience because we're not really asking the user to think about what obligations to put on data. What does different classification actually mean in terms of protections? We're just asking them to make a decision about what is the classification of data? In some cases, we can even automate that with machine learning. So those classifications become tags or attributes. And then we can use those attributes to enforce sort of organization wide policies based on that classification. So for example, if you have PHI, PHI has certain obligations associated with it. Or PCI. Or PII. These are pretty well established classifications of data. And if you have systems or people or both that are classifying that data and adding these tags to the data objects themselves, we can use those downstream to then enforce those policies and have visibility into how that data's being used.
Dave Bittner: Is the tagging, is this like a metadata situation where we're tagging the data and then- is the metadata available?
Dana Morris: Yeah, think about it as an envelope. So if you think about an envelope, the letter inside is the data itself. And the envelope is really providing metadata, deciding where it gets routed and how to handle it. And in this case, it's very similar. So, like trusted data format, for example. NSOZ is a specific for how to put a structured wrapper around the data object. And optionally encrypt that data object. And encrypt the policy, if you wanted to. And sort of attach all of that as a wrapper around the data object. So that when that data object is transmitted, when it crosses boundaries, when it's sent to different people, that wrapper still holds. And then we can use the attributes in that wrapper to decide who can access it, what they can actually do with that data, who they can share it with. We can have a lot of additional control and visibility into how they're working with that data, as long as it's been wrapped with that tag.
Dave Bittner: And how do we ensure that this sort of thing doesn't introduce undue friction for our users?
Dana Morris: That is definitely the key. And I think it's something we've spent a lot of time and energy on, Virtru is really the UX. I think one thing is that sometimes the decisions you want to make around data don't require actual encryption. So you really just want to use those tags to decide whether maybe something could or could not be sent. And then as you're making that decision, you might want to additionally encrypt. On the other end, you have to be thinking about the applications that are consuming that format. We're spending a lot of format in working with the community and with application providers. And how do you understand that format and then make sure that it can be seamlessly accessed even though it's wrapping that data?
Dave Bittner: Can the tags have attributes like expression dates, things like that?
Dana Morris: They can. In fact, I would think of those more as sort of additional obligations. So if you wanted to go with more of a, let's take a simple government scenario like a classification of, you know, secret or confidential or top secret. There might be a release ability, as in what countries or which organizations is this data releasable to? Those would be sort of your attributes, you know, release ability is an attribute name and value is the countries or the organizations it's releasable to. And then additionally, you can have obligations that you place on the data. Maybe you want it watermarked to the user. Or the organization. Maybe you want to prevent it from being forwarded. Perhaps you want to set an expression date so it's sort of the mission impossible scenario. This will self district in five minutes. Or an hour. Or two hours. All of those things are possible. And they're just done in the standard wrapper.
Dave Bittner: You know, we're certainly hearing a lot of talk these days about zero trust, both within the government and the private sector. How does this data centric security model fit into that idea?
Dana Morris: Yeah, when I think of zero trust, I think John Kindervag went into great detail about zero trust as a concept and a framework. But personally when I think about it, I like to focus less on the trust part and more about constant verification. And I like to tie it to like a physical example. So if you've ever worked in the government in a secure location, you're constantly verifying, whether it's scanning your badge or people asking if you have your badge on, or do you have the right clearance to enter this office, these are all verifications that are happening continuously. And they're part of a strong story around security, right? Ask questions, make sure you verify who it is, what they should be doing. Should they be there? Et cetera. And when I think about zero trust in the digital world, I think it's very similar. How do I always verify that you are who you say you are? And that you have a right to be here. And you have a right to do whatever it is you're trying to do. And in the context of data centric security, it fits very nicely into the zero trust picture as obligations and ways of enforcing and thinking about verification of you when you get to the actual object itself. So, zero trust or any security architecture is all about layering, right? Multiple layers is going to be best. I'm going to ask, I'm going to basically verify that Dana is who he says he is and has access to this network. And then I'm going to do the same check at the application. And then in this case, I'd be doing some of the similar checks at the data object itself. So it really fits in very nicely. Almost like a Russian doll approach there. With the data being the inside doll.
Dave Bittner: What about onboarding? For organizations that decide they're going to take this on and have a lot of data that they now have to, as you say, tag and put in these virtual envelopes. What's that process like?
Dana Morris: Yeah. I think, usually, the hardest part is actually the process of defining the classifications, or the attributes that you want to use for the organization. And this is moving beyond the government, right? I work with a lot of banks as well. And the banking industry, financial services in general, thinks a lot about classification. Now, they're thinking about it both from a security perspective and just from a risk perspective. Certain classifications of data have not just greater security requirements, but also have higher risk profiles. And so how do I track that? So the onboarding is kind of in two phases. Right? One is sort of day forward, so putting in place facilities so that data can be classified as it's being created and shared. That can be manual, as a user doing it. Google Workspace, for example, has added a concept called labels that allow users to do this in Google Drive with Doc, Sheets, and Slides. Or you can use, you know, automatic classification tools. Whether that's machine learning or even just simple statistical modeling. Categorization and classification is actually not a very hard problem to solve in modern day. And so another way you could do it is to start to use those kinds of solutions to auto tag and classify data.
Dave Bittner: For the folks who've gone through this and had success here, what does that look like on the other side, in terms of their experience?
Dana Morris: The other side of the sharing equation, you mean?
Dave Bittner: Well, in other words, they're up and running and it's a part of their every day operations.
Dana Morris: Beyond the security benefit, I think that risk piece is the interesting part. Which is greater visibility into how is your most sensitive data being used? And by whom? It's really hard to do that without those classifications, because you're then looking at data that Dana owns or Dave owns. And that's one or two people out of maybe tens of thousands. But if I have a consistent classification of PHI, for example, then I can have- I can use that classification to then actually do analytics and see like where is my PHI data being used? And then where's it being shared? So that visibility part actually is one of the best benefits that most organizations see if that piece. The security comes along and it's certainly really important, but ultimately it comes down to that risk analysis, and then, you know, just agility. The idea that you could quickly make a decision to entitle Dave to PHI, for example, and he just instantly gets access to large amounts of data that he couldn't access previously? That agility piece is a really interesting benefit and it's one of the ones that the government and the defense intelligence base is certainly betting on. But also, we see that with banks and financial services as early adopters. So that's been a big success for them is the ability to grant access to data that couldn't be seen previously in a really quick way.
Dave Bittner: I would imagine that that visibility also has benefits on the regulatory side, to have the data of who gets to see what, to be able to demonstrate that.
Dana Morris: Absolutely. I think demonstrating who can see it, and then thinking about things like retention, and eDiscovery, and all the things that you- all the obligations you have relative to the data you own. Knowing that you have a really good classification where you can pretty precisely say this is all of the PHI data that I have. Structured and unstructured. And here's all the people who have access to it. And the systems that have access to it. And the places it's being shared, I think that visibility is critical if you really want to keep up with sort of modern compliance regimes.
Dave Bittner: What are your recommendations for organizations to get started here? What's the best path?
Dana Morris: I think the first thing is before you even get to data centric security or even the security pieces of it and the enforcement, it's really starting on that journey of agreeing on a classification system. And then thinking about where you can apply it. And starting to actually classify data. There's a lot of work on that in the structured side. I think a lot of database vendors, a lot of organizations certainly do a good job of classifying structured data. But the unstructured world is definitely a little less mature. I think that's probably the biggest opportunity. So again, I'd go back to the Google Workspace example. I love what the Google team is doing there with labels, make it easy for a user to apply a sensitivity label to a document. Microsoft is doing similar in their Office Suite. And I think that's probably the first place to start, agree on, you know, have your working group agree on what your classification system and scheme and tagging is going to be. And then start figuring out ways to roll that out to the application. And then as you've got that in place, I think you can take the next step into great, now how am I going to use that to make decisions about access and then for security?
[ Music ]
Rick Howard: We'd like to thank Bill Newhouse from the US National Cybersecurity Center for Excellence, NCCoE, and Dana Morris, Senior Vice President of Product and Engineering at Virtru, for helping us get our arms around this data centric model. And we'd like to thank Virtru for sponsoring the show. This has been a production of the CyberWire and N2K. And we feel privileged that podcasts like CyberWire-X are part of the daily intelligence routine of many of the most influential leaders and operators in the public and private sector. As well as critical security teams supporting the Fortune 500 and many of the world's preeminent intelligence and law enforcement agencies. N2K's strategic workforce intelligence optimizes the value of your biggest investment. People. We make you smarter about your team while making your team smarter. Learn more at n2k.com. Our senior producer is Jennifer Eiben, our sound engineer is Tre Hester. And on behalf of my colleague, Dave Bittner, this is Rick Howard, signing off. Thanks for listening.
[ Music ]