The relationship between organizations and their data.

Transcript

Stephen Cavey: So the temptation to store anything and keep it forever is very high. And it's only because of regulation that's coming in now that companies are having to reconsider what they do with people's data.

Dave Bittner: Hello, everyone, and welcome to "Caveat," the CyberWire's privacy, surveillance law and policy podcast. I'm Dave Bittner. And joining me is my co-host, Ben Yelin, from the University of Maryland Center for Health and Homeland Security. Hello, Ben.

Ben Yelin: Hello, Dave.

Dave Bittner: On today's show, Ben has the story of lawyers being denied entry to Madison Square Garden as the result of facial recognition software. I've got the story of some law professors worried about ChatGPT. And later in the show, Stephen Cavey from GroundLabs is here to discuss organizations, their data and compliance. While this show covers legal topics and Ben is a lawyer, the views expressed do not constitute legal advice. For official legal advice on any of the topics we cover, please contact your attorney.

Dave Bittner: All right, Ben, we got some good stuff to share this week. Why don't you start things off for us here?

Ben Yelin: Yeah. We have some fun stories. The first one from The New York Times, entitled "Lawyers Barred by Madison Square Garden Find a Way Back In" by Kashmir Hill, this one is just a fun, interesting and fascinating story. So several months ago, Madison Square Garden in Manhattan brought in facial-recognition software, ostensibly for security reasons.

Dave Bittner: Yeah.

Ben Yelin: You want to keep out people who have criminal records, are on a terrorist watch list, etc. So anybody entering for a Knicks game, a Rangers game, some type of concert, performance, their face is going to be read by the scanner. And if they're red-flagged, they can be denied entry into Madison Square Garden. Turns out, they're not just using it for criminals and terrorists. They're using it against one of society's other least popular groups of people. And those are, of course, attorneys.

Dave Bittner: (Laughter).

Ben Yelin: So there are a lot of people who have lawsuits that implicate Madison Square Garden in one way or another. They give a couple of examples here. Somebody is involved in a lawsuit with one of the ticket reselling companies about something, and Madison Square Garden is on the opposite side of the suit. And they don't want this lawyer in their facilities. They have a stated reason, which I think is actually somewhat of a proper justification, that they don't want these lawyers to engage in improper discovery away from the normal judicial process. So you can go through discovery. It's all very well-documented in the court case. You make an appointment. You have a stenographer. You're getting written records. What they don't want is people randomly going to a Rangers game and, you know, as part of the lawsuit - maybe there's a lawsuit that Madison Square Garden has been negligent in upkeep of its restrooms. And you have the lawyer going in and taking notes. They don't want that to happen. So a bunch of lawyers have been denied entry to these events. Prior to facial recognition, this would have been nearly impossible. You could have people's names on some sort of list, but they're not checking people's ID, generally, when you enter into a sporting event, and it would be very time-consuming to go over, you know, a list of banned attorneys...

Dave Bittner: Right.

Ben Yelin: ...On a piece of paper going to a Knicks, Rangers game or a concert.

Dave Bittner: Yeah.

Ben Yelin: So it's much easier now that you have facial recognition software. So what are the lawyers doing? Well, they're fighting back with a weapon they know best, and that is lawsuits.

Dave Bittner: I was going to guess false mustaches, but (laughter)...

Ben Yelin: False mustaches would have been a great idea. I think there could be an entire industry in fake facial hair. Does not work as well for women, I've found.

Dave Bittner: Well, yeah. Never know.

Ben Yelin: But, you know, you could...

Dave Bittner: Don't judge, Ben. Don't judge.

Ben Yelin: I will not judge. And, you know, you might want to try some other possible disguises, some type of Groucho Marx, you know...

Dave Bittner: Right. Sure.

Ben Yelin: ...One of those eyebrows, nose and glasses things.

Dave Bittner: That would work.

Ben Yelin: Yeah. But instead of that, the lawyers are suing Madison Square Garden. And the cause of action is based on a New York state law dating back to the 1940s. What's really entertaining about this is, basically, the problem in the 1940s was that Broadway theaters were denying entry to theater critics who they thought would give negative reviews of their performances. This sounds like...

Dave Bittner: (Laughter).

Ben Yelin: This sounds like the alternate plot of "The Producers."

Dave Bittner: Right (laughter).

Ben Yelin: Yeah. Of course, they didn't have facial recognition. So in that sense, it was a - there was an actual list, and people - or, you know, it would just be manual recognition.

Dave Bittner: Yeah.

Ben Yelin: You know, you have a producer standing at the door and saying, I'm not letting this critic in.

Dave Bittner: Right. And a Broadway theater has much fewer attendees than Madison Square Garden, for example.

Ben Yelin: Right. So they were actually - the lawyers were actually successful. Of course, they hired the most high-profile attorneys, and they were able to gain an injunction that lifted a ban under this anti-discrimination law that prohibits, quote, "wrongful refusal of admission" to "places of public entertainment or amusement." So under this injunction, Madison Square Garden can refuse to sell lawyers tickets, but it can't refuse entry if those lawyers buy tickets from a secondary source or, you know, they get their firm's tickets, and they show up. What's fascinating about this - or there are two things that are fascinating about this. One is one of the plaintiffs brought the court order, and his face was scanned upon entry to a Wizkid concert, and he was still denied entry. They were basically like, yeah, that's great - your lawsuit. Awesome. But they still turned him away. The second thing that's really entertaining about this is that the statute only applies to entertainment and amusement. And according to other court precedent in New York state, that does not include sporting events.

Ben Yelin: So there are different rules now at Madison Square Garden for performances and for sporting events, so for performances, this statute, according to this judge, at least in a preliminary sense, does apply. You can't wrongfully deny entry. But for sporting events, yeah, it's - you can still have that face read and knock people out and tell them they have to leave.

Ben Yelin: This is a fun, entertaining story. You know, I think the serious element of it is you wonder how this facial recognition software could be weaponized in ways that are a little more nefarious or discriminatory. Right now, it's being used against an unsympathetic group of people, and that's lawyers. But you could understand a context where they were removing people, you know, based on demographic characteristics. You are not allowed to discriminate based on race, and in New York state, sexual orientation, gender identity, etc.

Dave Bittner: Yeah.

Ben Yelin: But they could come up with arbitrary reasons to ban people and can enforce it with facial recognition software, which I think - you know, the fact that that would be legal is somewhat concerning. So I think it's kind of incumbent upon New York state and other state legislatures to try and clarify proper uses of facial recognition technology where if you're going to deploy it in these event spaces, then you have to exclude people for justifiable reasons, like being on a terrorist watch list or somebody with a history of violent crime rather than, I just don't want this person to come into my facility, and I have the technology to prevent it.

Dave Bittner: Mmm hmm. So does MSG's rationale here hold water, or do we think that this is sort of cloaking a - basically comes down to being a jerk move?

Ben Yelin: Oh, it's a total jerk move. But I think putting aside this amusement entertainment statute, it is well within their power to deny entry to people.

Dave Bittner: Right.

Ben Yelin: I mean, this is a private company. It's not a public space. They can deny entry to anyone for any reason besides being part of one of the protected classes in civil rights laws.

Dave Bittner: I see.

Ben Yelin: So it is a place of public accommodation, meaning anybody who buys a ticket can go there.

Dave Bittner: Right. But I can't tell you you can't come to this sporting event because you are Black.

Ben Yelin: Exactly. You can't do that. But you can say, I'm not going to let you in because I have a personal vendetta against you, or you know, I can't let you in because the Knicks are playing the Celtics, and you have a Boston ZIP code in your billing - you know, your billing address. So I think they definitely have the right to do it. I just think the concerning point from a policy perspective is this would - there would have been ways around any other type of ban that wasn't enforced by facial recognition software.

Ben Yelin: So we've tried to see this, and there are situations in other sporting contexts where the home team will try to exclude fans from purchasing tickets on a secondary market if they have a ZIP code from the visiting team's general area. And it's just much harder to enforce that because you can have - you can use credit cards that have billing addresses from a different location. You can have somebody else buy the tickets. With facial recognition, there are no workarounds. And beyond that, we know the limits of facial recognition technology. It's not as good at recognizing people's faces for people of color. And there are other difficulties as well. So it's just you have that type of enforcement mechanism - I think it introduces new concerns that just didn't exist prior to this technology.

Dave Bittner: Is Madison Square Garden making any indication that they're going to back off of this?

Ben Yelin: I don't think so. You know, I think we're going to see more cases like this because Madison Square Garden, despite this temporary injunction, still is flexing their legal muscles. I mean, they're still able to institute this ban on adversarial attorneys as it relates to sporting events. And I think they see that as being to their advantage. So I don't think they're going to back off of this. You know, one of the reasons they're able to get away with this is because everybody hates lawyers.

Dave Bittner: (Laughter).

Ben Yelin: I think, you know, if it was - even like a...

Dave Bittner: I love you, Ben.

Ben Yelin: All right, so you love me, and I respect that. But, you know, beyond me being a human being...

Dave Bittner: Right.

Ben Yelin: ...I think people dislike me because I'm a lawyer, and everybody hates lawyers.

Dave Bittner: Yeah.

Ben Yelin: So it would be one thing if they were banning people from competitive arenas. Let's say, you know, we have a really unique way of putting on a show here at Madison Square Garden. So if you're working at the Prudential Center in Newark, you know, we're not going to let you in because we have some trade secrets on how we run our, you know, pregame hockey show or whatever.

Dave Bittner: Yeah.

Ben Yelin: I think that wouldn't have been as acceptable in the court of public opinion. The fact that it's lawyers - I think very few people are going to be sympathetic. And that's one of the things they say at the end of this article. Lawyers may not be the most favored class, says Alan Greenberg, who is a law firm representing a fan who sued Madison Square Garden, but it could be expanded to any other class of individuals - perhaps a class that is more or that is less disfavored than attorneys. So I think that's certainly one of the reasons they're getting away with this.

Dave Bittner: Lawyers are not a protected category.

Ben Yelin: That's right.

Dave Bittner: All right. Boy, that's interesting, isn't it?

Ben Yelin: It's such a fun story. And it's like the person that they're profiling here, like, you just look at him. You know that he's an attorney. He's wearing the - you know, an outfit with really nice clothes and holding a coffee cup. And it's like, am I really supposed to feel sorry for this guy?

Dave Bittner: Like, if you called up central casting and said, send over an attorney, it would be this guy.

Ben Yelin: Exactly.

Dave Bittner: Yeah.

Ben Yelin: Exactly. And you just look at him, and you're like do I really want to go to court to fight for this person...

Dave Bittner: Yeah.

Ben Yelin: ...To attend Knicks and Rangers games, you know?

Dave Bittner: (Laughter) Right, right.

Ben Yelin: But I think there are broader issues here. It does relate to facial recognition technology itself. And I think there might be especially future justification to limit the use of facial recognition technology to circumstances defined by a legislature. I don't think this would be something that could be done at the federal level 'cause I think that would be intruding on states' powers, especially as it relates to regulating private companies. But I think it's certainly something that could be done at the state legislature level.

Dave Bittner: Interesting. All right. Well, my story this week comes from Reuters. This is an article written by Karen Sloan. And this centers on some law professors - of which we have one in the room - being fearful of ChatGPT's rise to prominence here. So we've been talking about ChatGPT - everybody's been talking about ChatGPT.

Ben Yelin: Topic of the hour.

Dave Bittner: Yeah. And, of course, it's been of concern to professors, teachers all over the place because of the possibility for students to use it. And it seems to be quite good at writing and coming up with - I think - I said - put a thing in it that said solve this math problem and show your work. And it did it, you know? So (laughter)...

Ben Yelin: That's - yeah, I mean...

Dave Bittner: Right.

Ben Yelin: ...That's most of elementary school mathematics right there.

Dave Bittner: Right. Exactly. So this caught my eye because I thought this would be an interesting discussion for the two of us because you are, indeed, a law professor.

Ben Yelin: And somebody who panicked when I saw ChatGPT 'cause I thought, oh, God, my students are going to use this.

Dave Bittner: Yeah.

Ben Yelin: And this is going to be a problem.

Dave Bittner: Right. So what do you think here, Ben? Are your students going to use this, and is this going to be a problem?

Ben Yelin: I think they're going to try. I think it's going to be less useful for law students than students in other professional schools or even undergraduate students for a couple of reasons. One is that in law school, most of the exams are based on hypotheticals that the professor writes. So I never got an opportunity to be a fiction writer except in designing law school exams.

Dave Bittner: Oh.

Ben Yelin: So I come out with a hypothetical, and the students have to apply the law to those facts. I think that's just a little too specific to be picked up and recognized by ChatGPT. So that's...

Dave Bittner: Have you tried? Have you tried any of your test questions?

Ben Yelin: I have. And the answers have just not - so I've done this a couple of times, actually, and the answers are just very superficial. Like, they'll state what the relevant law is, but they don't do a good job of applying the law to the facts. So in my class, that's like a B or C exam.

Dave Bittner: OK. Still passing. Still passing (laughter).

Ben Yelin: It is still passing. On that note, by the way, they had - this article had a couple of law professors who put some of the multi-state bar exam questions into ChatGPT, and ChatGPT got 50% of them correct as opposed to 68% for human exam takers. So that is a difference. It's not that big of a difference. And, you know, ChatGPT is in it's early - we're kind of in the beta period of ChatGPT...

Dave Bittner: Right.

Ben Yelin: ...Where it's still finding its legs. Like, a year down the line, maybe they're going to get better scores than students. I think this is going to be a race between students and ChatGPT on the one side and then academic institutions and other technological partners on the other lines - on the other side to try to ban the use of ChatGPT on campus or have exams or other law school exercises take place offline where students don't have any internet access. I've heard a couple of suggestions on how to cope with ChatGPT for law students. One of them is a professor that had forced students to write a paper disconnected from the internet as enforced by the school's firewall or their IT department.

Dave Bittner: Yeah.

Ben Yelin: And then they could make revisions - the professor would read the first draft, then they could make revisions with the use of a network. But, you know, that might be hard to scale because there are a lot of take-home assignments in law school.

Dave Bittner: Let me - so having never been to law school, help me understand here. What references do law students have at their disposal in the normal course of taking an exam, if any?

Ben Yelin: Many things. So generally, the professors decide what type of access the students have during the exam. So, for example, we get three choices. It's either a closed-book, closed-note exam - only the mean professors do that...

Dave Bittner: (Laughter).

Ben Yelin: ...Where you basically have to go by memory. Now, that is how the bar exam works. You...

Dave Bittner: Oh.

Ben Yelin: ...Do not have access to the network. You couldn't bring any books or materials.

Dave Bittner: Right.

Ben Yelin: You had to memorize everything. And I had a couple of those exams in law school. The way I like to administer the exam is what I call the CTRL-F exam, where you can have notes on your computer or the PowerPoint slides from class, but you can't have access to a network. So that would be acceptable in a world of ChatGPT because you're not going to know the questions ahead of time, so you can't type them into this chatbot AI and have it spit out answers. You have to use the notes that were already on your computer. The third option - and this is something that's, I think, being used more, especially as we've moved to virtual education, just 'cause it's harder to enforce limits on network use. So the third option is just completely open. You have open internet access. I always warn my students that it could do them more harm than good because you have access to treasure troves of information. If you wanted to know a treatise on a particular case we studied in the class, you would have access to it. You would have access to the case itself. Legal online law libraries are really great. So cases can lead you to secondary sources, and you could really develop a good analysis. Your only limit is time, and it is pretty time-consuming to go through Westlaw or LexisNexis and try and glean all that information manually.

Ben Yelin: The concern with ChatGPT is you skip those steps. You have open networks. You type the hypothetical into the chatbot, and it's able to give you at least a satisfactory answer. And I think that's something that we need to be concerned about as professors and as academic institutions. Prior to ChatGPT, I think the thinking was we would be able to - there's anti-plagiarism software, for one. And I can always tell when a student is plagiarizing because you kind of learn what the student's voice is. And when they copy and paste, and it's in a different font, and it doesn't sound like them...

Dave Bittner: Right.

Ben Yelin: ...You know, I have a pretty good idea that they're cheating. ChatGPT - the way it's programmed is it's supposed to sound - it's supposed to reflect the input, which is what's out there on the internet. And it's supposed to sound human. I mean, it's supposed to be as if you're actually chatting with an expert on the topic that you're studying. So...

Dave Bittner: And you can - I'm just thinking in real time here that, you know, because you can you can ask ChatGPT to write in a particular style, theoretically - and I haven't tried this - but I could say please word your answer in the style of this particular Supreme Court justice.

Ben Yelin: Right. You can have, like, write a Scalia dissent about this hypothetical. And, you know, I think they would absolutely be able to do that.

Dave Bittner: Yeah.

Ben Yelin: There are limits on ChatGPT now. I think, as the technology develops, those limits will start to fall by the wayside. And I think law schools are going to have to come up with policies. I think this will manifest itself in, you know, revised honor code provisions saying that you can't use this type of technology for take-home assignments or exams. And if you do, you know, even though it would be really hard to find out - it would be really hard to enforce - you could subject the students to sanctions, discipline, etc. So I think we're not that far - probably days, if not weeks, from law schools implementing that type of policy. And then it'll just be up to the technology to keep up with whether professors can figure out how the answer was drafted by ChatGPT. I saw that a law student somewhere tried to - or did come up with a technology that would allegedly determine whether the answer was drafted by a chatbot.

Dave Bittner: Yeah.

Ben Yelin: So maybe that's the solution us law professors are looking for here.

Dave Bittner: I just wonder if this is the first step in a change in the way that we handle a bigger - higher-level changes are coming because if a tool like ChatGPT is going to be a regular part of a professional lawyer's toolkit, then why deny the students that? In other words, how long are we going to pretend like professionals rely on memorizing everything, you know? That's just not the way of the world anymore.

Ben Yelin: I agree with you. I mean...

Dave Bittner: We all have - all - we have - all have access to all the world's knowledge, 24/7, in our pockets with our little, portable supercomputers. And to pretend like you need to memorize everything, I just - it just strikes me as being unrealistic. And I wonder, at what point do we acknowledge that perhaps it's not productive.

Ben Yelin: Right. And I think that's happened in previous iterations of technology. It's like, all right, force students to actually go to the library and find the law books and - to reference these cases. Westlaw and LexisNexis take off, and it's like, well, why would we deny students to have this opportunity to get those cases at their disposal in an instant without having to go to a physical law library? Even something like Wikipedia - I tell my students, never quote Wikipedia directly. Don't trust what it tells you. But if you want a colloquial summary of a Supreme Court case and you want links to the case itself and to good secondary sources, why not use Wikipedia? It's useful for that purpose.

Dave Bittner: Right.

Ben Yelin: And why should you deny yourself that knowledge? I mean, I think as long as we can prevent students from actually taking exams by using ChatGPT, I think it's important to integrate it as a tool 'cause it is a pretty useful tool.

Dave Bittner: Yeah.

Ben Yelin: I think that's going to be a long-term process, how to balance having it as a tool versus not using it to cheat. And I think we've kind of mastered that balance through other methods of technology, and I think we'll just have to go through the same process here.

Dave Bittner: I will admit that some of my - let's just call it bitterness here is from having grown up in an era where teachers said to us, particularly in math class, you know, well, you can't have a calculator during the test because you're not always going to have a calculator with you. Ha.

Ben Yelin: Guess what, Mrs. So-and-so? I have a calculator in my pocket, and I can use it.

Dave Bittner: That's right. Again...

Ben Yelin: Yeah.

Dave Bittner: Once again, I have a supercomputer in my pocket with access to all the world's information, and it will show my work. So you were wrong (laughter).

Ben Yelin: You were wrong, Teacher. Yeah. I mean, I think they misled us.

Dave Bittner: But we're going to teach you how to think. Oh, yeah, I know.

Ben Yelin: I guess.

Dave Bittner: Meanwhile, back in the real world...

Ben Yelin: Yeah.

Dave Bittner: You know.

Ben Yelin: I guess you need to know how to type in the numbers on the calculator, you know, to be fair.

Dave Bittner: But - yeah, yeah.

Ben Yelin: Yeah.

Dave Bittner: I mean, you need to know what questions - it's that whole old thing about the difference between knowledge and wisdom is...

Ben Yelin: Right.

Dave Bittner: ...You know, knowledge is knowing the thing. Wisdom is knowing when to apply it. So there's something to that. But I also think - I guess what I'm concerned is that we don't inadvertently end up with a false gap between how things are happening in the real world through the advance of technology and the kinds of tasks we're putting our students on to - are we certifying them for a world that no longer exists?

Ben Yelin: I completely agree. Yeah. I mean, we shouldn't deprive law students of technology that will be at their disposal when they become lawyers.

Dave Bittner: Right.

Ben Yelin: So as long as they're not using it to cheat, you know, I think it would be wrong to deprive them of some of those advantages. I think you're right on.

Dave Bittner: But what is cheating, right? But what is - do we - you know what I'm saying?

Ben Yelin: That's an existential question.

Dave Bittner: Well, but it - I think it's a serious one. To what - how do we define what constitutes cheating today? I don't know that it's - I guess that line has gotten fuzzier in my mind.

Ben Yelin: Well, I think we should handle this the way lawyers handle it and put...

Dave Bittner: (Laughter).

Ben Yelin: ...Have a high-price conference, put together some academics with earned or unearned credentials.

Dave Bittner: Sure.

Ben Yelin: And they can solve this problem for us with some type of treatise or a law review article. So I'm sure the smartest people in the room will get together and come up with a good solution for us.

Dave Bittner: All right. Fair enough. All right. Well, we will have links to all of today's stories in the show notes. And, of course, we would love to hear from you. You can email us. It's caveat@thecyberwire.com.

Dave Bittner: Ben, I recently had the pleasure of speaking with Stephen Cavey. He is from an organization called GroundLabs. And our discussion centers on data and companies' data compliance. Here is my conversation with Stephen Cavey.

Stephen Cavey: I think we brought it upon ourself. If you look back in the past, we were relatively unregulated. We could collect data, as much of it as we liked, and do as much with it as we could. In fact, the more data you collected, the better because it means we could get more insights about our customers, figure out their likes and their wants and their needs and market to them better, sell to them better and perhaps even deliver better service. Unfortunately, you know, during those times of the Wild West, we were overcollecting (ph) data significantly to the point where any data that we could get about a customer was data that we wanted without really putting much thought around how we were going to manage and secure and look after that data on a long-term basis and then what rules or regulations might come about that would require us to do things with that data.

Stephen Cavey: And I think that's why you now see - what? - over 130 countries now have implemented modern privacy laws to protect the rights of individuals. And that's largely because businesses were given the opportunity to self-regulate for the longest time. And they didn't do a great job at it, which I think explains why you see so many data breaches out there that have been happening every year, year on year on year. And there's no signs of that letting up. And so as a result, countries have had to step in and implement very difficult laws, as I know you'll appreciate and have talked about many times on this program. And the penalties are continuing to increase around the world. I think in Australia now, we've just seen a significant increase in the penalties down there in reaction to recent data breaches that have been happening. And this is the common theme. What do you need to do to get businesses to take note of these new regulations with regards to people's data and upholding privacy? And so the penalty framework is one of those ways. But there's so much more that we can also be doing as an industry and as a community to be looking after the identities of people and the data we collect about them.

Dave Bittner: What sorts of things do you think we should be doing here to be more effective, to be better stewards of our customers' and clients' data?

Stephen Cavey: Well, I think we have to start the overcollecting, this whole - not start, stop the overcollecting. And as a result, what we should be doing is only collecting exactly what we need. There's a common term called data minimization in our space. And that's really about, what's the data that you absolutely must collect in order to deliver the product or service that you provide? And then making sure that you only collect that and then managing that data well. You know, I think the other problem that's been happening is we've collected so much data about individuals, we're storing it everywhere. And then we're very quickly losing track of where it's ending up across our organization, across our environment. And so we've - you know, just to layer that on, IDC comes out with statistics now measuring the amount of data that's being created out there. And as many people can appreciate, the volume of data that organizations have now is exploding.

Stephen Cavey: Gone are the days where we all had 5- and 10-megabyte quotas on our mailbox, and we weren't allowed to send mail when we would hit that quota. Now we are given substantially more space to store within, in some cases unlimited space depending on your cloud provider. And so the temptation to store anything and keep it forever is very high. And it's only because of regulation that's coming in now that companies are having to reconsider what they do with people's data. And so minimizing the data - but I think more fundamentally - and this is certainly an area that GroundLabs is out there specializing in - is knowing where your data is.

Stephen Cavey: I think that is, by far, the biggest challenge for organizations. We survey a lot of companies out there and a lot of group professionals, and we generally find about 70% of respondents will answer the simple question or give a response to the simple question, do you know where your data is? The 70% will say, no, we don't know where our data is, where all of our data is being stored. And so I think that really brings home the reason why we're seeing so many data breaches happen out there in the world because very often, the data that is stolen from the organization or lost by the organization is data that they didn't know they were storing in the first place. And so I think that comes back to the fact that when you - particularly in today's world, when you think about the movement of stuff that's been happening between organizations and how difficult it is to keep your people in today's day and age, the institutional knowledge that you've built up within the business about data - about where it is, about what the business is doing with it and about how it's being handled - it's very easily lost.

Stephen Cavey: And so you have a current generation of staff that are aware of the current processes as they happen today but not necessarily the processes that were happening last year or three years ago or five years ago. And yet that data is still being stored somewhere in the business. And if that data relates to individuals - customers, employees and other scenarios where it's highly sensitive information - it needs a very high level of protection and it needs to be monitored to make sure that it's staying safe and it needs to be managed to make sure that it's not being kept for longer than is needed. Then you have a real challenge on your hands. And then you layer on the further problem of the number of locations where we store data. Now we're storing data both on premise and in the cloud. We're storing it in SAS providers. We're storing it in databases. We're storing it as files that are synchronized both on servers and desktop.

Stephen Cavey: And so the overall challenge of understanding what data you have, where it is and, of that data, what is sensitive, what is high-risk data versus what is low-risk data that doesn't need all of the right controls and processes around it, it's an incredibly difficult task that businesses are facing. And so with all of these new regulations coming down the line, that's certainly making this problem even more challenging to deal with. So, Dave, just to - I think to summarize, you know, minimize the data you're collecting and then the data that you do hold, start to understand where all of it is ending up within the organization - where all of it exists, where it's stored, and how much of it that you have. Only then can you start to begin the journey of protecting that data and then monitoring that data to make sure it stays protected on an ongoing basis.

Dave Bittner: Could you share with us some insights as - in your experience, what leads to this kind of data sprawl? I mean, is this a natural side effect of people just going about their day-to-day business?

Stephen Cavey: It is. I mean, people need to do their jobs, right? And following procedures and processes and having the best written policies on what you should be doing or what you're meant to be doing with individuals' data is absolutely the right thing to have in place, and you must have it in place. However, ensuring that the individuals are following it is the other half of the challenge. You know, going back a long time now, but when I was in the hot seat running data security for an organization that I worked in, you know, I was rolling out these policies and educating the team on what was good practice, what was inappropriate practice and what we need to be doing going forward. And yet, I was still challenged by individuals. You know, take the simple email example. Back then, using email to send files full of personal and sensitive data was still very commonplace. In fact, it's still done today in some places. But we just, you know, started the process of saying, right, well, you know, here's - here are alternative ways that you can get these sorts of files to your third party that you need to exchange this data with legitimately. Don't - please don't send it through email.

Stephen Cavey: And we found that one of the people we were working with complied with that request and didn't send email using the company's email server. And so they instead used their personal email account to send those sensitive files to the third party because that third party was insisting that that data continue to be sent via email. And in their view, they were just doing their job, and they had to do what their customer was asking for. And, you know, again, very simple example, but that's what we're working with here. Individuals want to stay productive. Unfortunately, security is very often seen as an inhibitor to doing business or slowing things down and, you know, people will try and get around that. And so I think this is where education plays a very big role, Dave - teaching the individuals across an organization the importance of protecting an individual's data.

Stephen Cavey: In other words, when you see the private details of a person, whether it's a customer or whether you're in HR looking at an employee's data, knowing that that is very sensitive and the organization has been trusted to look after that data and the individual providing that data makes a very big assumption that the organization has all the right controls and all the right processes to look after that data. And so making sure the awareness of protecting information, protecting personal data, sensitive data, confidential data across the organization is embedded into the culture of the organization. It's not something that's just simply taught in a classroom where you go to security training and everybody looks at the ceiling for an hour and then walks out going, great, glad that's over. Let's get back to doing our jobs again. You know, there's so many great things going on out there in organizations today to really ingrain a culture of security - particularly data security - to make sure that this sort of information is being better handled. But we're still a long way there yet, Dave. There's still plenty of good work that we can all do.

Dave Bittner: Can you give us an idea, what happens when a company decides to engage with an organization like yours and they're looking to go through the data discovery process? How does that work? Is it a matter of, you know, crawling through everything they have? I suspect it's a little more sophisticated than that.

Stephen Cavey: You're right, Dave. I think if we go back in time, traditionally, there was very large assumptions made about where data was in the organization. And again, it very much explains why we were seeing - and continue to see, even today - so many data breaches happening because there are so many organizations, even now, that continue to do a manual approach or what we call an assumptions-based approach. You make conclusions about where data is by going and asking the business and then taking that information and putting together some sort of a manual data map that shows where the sensitive assets - where the sensitive data assets are across the organization. In today's world, those who do embrace using data discovery are what we call an evidence-based approach - or they're taking on an evidence-based approach. So rather than making assumptions about where data is, they're saying, right, let's remove all assumptions and go back to the beginning and start to collect evidence about where data is and where it is not.

Stephen Cavey: And unfortunately, for some organizations, it does take suffering a data breach to completely change that culture and that thinking to then have them go from an assumptions-based approach to an evidence-based approach. And so the problem that we see with data discovery in the past was that the organizations that were willing to do it would still make an assumption about where they think the problems are and only focus their data discovery efforts on those parts of the business where they expect data to be found. And the other areas of the organization where no data was expected would not be looked at. And, you know, given that we work with organizations of all sizes, we've seen and heard absolutely everything where data in the thousands - I mean, to the millions to the hundreds of millions of records had been found in locations that were completely unexpected, where there were very big assumptions made that there will be no data found. And then it uncovers the craziest of scenarios that were just completely off the radar of the current generation team.

Stephen Cavey: So the recommendation when you're embarking on knowing where your data is and going on a process of data discovery inside every possible location you can, you do have to remove those assumptions. Start at one end of the organization and go to the other and basically leave no stone unturned, and bring everything into scope for your data discovery exercise until you can prove that it's out of scope. And even then, it doesn't hurt to then do some periodic sampling over a multi-year period of areas in the business where there was nothing found in the past and then going back and revalidating that those parts of the business continue to not store data of interest, whether it's, you know, personal data or sensitive data, whether it's highly confidential data or company IP trade secrets and so on. Critical data or data of interest comes in so many different forms now. And, you know, I think coming up with a methodology on what is valuable and what is not valuable to the business is equally important when you're going into a data discovery strategy.

Stephen Cavey: Because apart from looking for obvious things like certain types of personal data or data of customers, there are various other types of data that you would find very interesting if you knew what was there. And so looking for as many different types of data as you can in different stages and making sure you search across every area of storage, whether it's on premise, in the cloud, whether it's file servers, whether it's on the desktop, on those endpoints because again, that's a big assumption that we often find companies make is that because they use file servers, because they use cloud storage, they assume that there's no data of interest on those endpoints because it's all synched back up to the cloud when in actual fact, the way data synchronization works, you effectively still have local copies of data that individuals work with. So let's say you have someone from HR who's working with employee files day in and day out, contains highly sensitive information, and they're opening that up on their laptop, you're having artifacts of that data still ending up on those endpoints.

Stephen Cavey: And unless you're identifying that and then having a process of making sure that gets cleaned up systematically, then you're having these unknown risks exist out there, particularly in a world of working from home, where, you know, a larger number of employees have laptops now than ever before. And those laptops are traveling. They're in the back of taxis, they're in - on the metro system, they're in the back of an Uber, and they're getting lost or they're being misplaced. And there is data of interest on those devices. So just - I think the main point here is take all the assumptions off the table, use evidence to find out where all that data really is being stored and go from one end of the organization to the other and get a true picture of how much data the organization is storing, what types of data it's storing, and then you can start to make real decisions about the types of security controls and processes that need to be implemented to truly protect and ensure that data stays safe. And if you get that right, then complying with any subsequent privacy laws or regulations - doesn't matter how many of them you have to deal with - that process becomes significantly easier if you get the fundamental basics right, as opposed to taking a regulation-based approach where you only follow the regulation, you tick the boxes as they need to be ticked and you do nothing else. Unfortunately, that's been the common practice of the past.

Dave Bittner: I'm curious, do you ever cross paths with people who don't want to know, who think that perhaps, you know, if they're - if they have to stand in front of a group of regulators, that claiming ignorance is going to be their best strategy?

Stephen Cavey: It's funny, Dave, I often say there's three reasons why people change from the assumption-based approach to the evidence-based approach. I alluded to one of those reasons before. No. 1 is a data breach. That's the biggest reason, the biggest motivator that you'll find a company will move from using assumption to make their data security decisions to actually requiring evidence. The second is regulation. So they have a - you know, they have a deadline, a compliance deadline, and they're probably working with an external assessor of some sort. And that external assessor has guided them or encouraged them to use an evidence-based approach.

Stephen Cavey: And then the third, which I think answers your question, is what I call a change of guard. So we have a new CISO or a new head of privacy, a new head of IT security, a new person who's in charge and responsible for this problem of securing data. And they're asking their current generation team that they've just inherited, where are we storing data, how much data do we have and what do we do with it? And the answers that they're getting back from that team are not the answers that they feel very confident in. And so they decide, let's embark on doing an evidence-based approach because everything I'm hearing from the business I'm not convinced is really the case. And ultimately, it's myself that is on the line here, should there be a problem, and I will have to answer to that. And so they use - they effectively use this opportunity of coming into the business brand-new, not holding onto any legacy of the past about decisions they might have made and then going ahead and doing a complete end-to-end, evidence-gathering approach. And so - pardon me - that is very often the baseline of where we see people move away from not wanting to know because you're right, Dave. In the past, there were a lot of people that would rather not know, and ignorance was bliss. And unfortunately, it took a horrible event like a data breach to have to change that entire circumstance.

Dave Bittner: And I suppose, I mean, the peril is much greater, given the regulatory regimes that we have around the world now.

Stephen Cavey: You're exactly right. It's taken regulation to really push businesses to try and understand this problem far better than they ever have, to the point now where - you know, we've seen the rise of the chief information security officer. Now we're seeing the rise of the chief privacy officer, as well as the rise of the chief data officer. And we're finding, particularly in mid-to-large organizations, those three roles are having to work together very closely and are pivotal to ensuring this data security set of requirements are being met. And they have to work very closely together to make sure that the business can continue to trade and evolve and grow whilst also ensuring that the data is being kept secure and that the compliance and regulations are being met correctly because, you know, to the previous point that you asked about, Dave, in the event of a data breach and you have to face the regulator, ignorance will not get you through.

Stephen Cavey: In fact, it's the other way around. The more ignorant you are, the heavier the fine or penalty that they will levy upon you, versus if you show that you were actually putting controls in place, you were making an effort, you were attempting to understand the regulation and take the right steps towards complying with that regulation, and yet something unfortunate still happened, that at least puts you in a stronger position to argue before the regulator that you were doing some good things. Unfortunately, a bad event still happened. Perhaps you had a control that failed. And that is very much the case. It is much harder to defend than it is to attack. It's far easier to be the attacker than it is the defender. And the regulator understands that. But what they're looking for are organizations that are taking the right steps to look after people's data as opposed to, you know, taking an ignorance-based approach.

Dave Bittner: Ben, what do you think?

Ben Yelin: That was really interesting. I like his approach of you have to do an inventory of all the private data that you have, and that should guide your company's practices of complying with regulations and figuring out how many resources you need to devote to protecting data. It's about how big your risk is. I mean, I look at it as kind of, like, an insurance company approach. And I think that's better than just kind of blanket acceptance of broader regulations or instituting practices that aren't a fit for your organization. So...

Dave Bittner: Yeah.

Ben Yelin: ...I enjoyed the interview.

Dave Bittner: Yeah. All right. Well, our thanks to Stephen Cavey from Ground Labs for joining us. We do appreciate him taking the time.

Dave Bittner: That is our show. We want to thank all of you for listening. The "Caveat" podcast is proudly produced in Maryland at the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. Our senior producer is Jennifer Eiben. Our executive editor is Peter Kilpe. I'm Dave Bittner.

Ben Yelin: And I'm Ben Yelin.

Dave Bittner: Thanks for listening.

HOST(S):

Dave Bittner is a security podcast host and one of the founders at CyberWire. He's a creator, producer, videographer, actor, experimenter, and entrepreneur. He's had a long career in the worlds of television, journalism and media production, and is one of the pioneers of non-linear editing and digital storytelling.

Ben Yelin, JD, is the Program Director for Public Policy & External Affairs at the University of Maryland Center for Health and Homeland Security, where he consults public and private entities on homeland security, cybersecurity and emergency management policy. He is also an adjunct faculty member at the University of Maryland Francis King Carey School of Law, where he teaches courses on electronic surveillance and the Fourth Amendment.

Schedule: Wednesdays

Creator: CyberWire, Inc.