Research Saturday 4.25.20
Ep 132 | 4.25.20

Contact tracing as COVID-19 aid.

Transcript

Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday, presented by Juniper Networks. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Dave Bittner: [00:00:25] And now a word from our sponsor, Juniper Networks. It goes without saying that we are in an ever-changing world and the change keeps going faster and faster. This ever-accelerating pace is not new, but we find ourselves in an environment where we must respond to this change at the same speed as it comes at us. But we, as we all know, have a hard time keeping up. For security professionals, the need to keep up is essential. Juniper Connected Security is responding to what is happening in the market, the convergence of infrastructure, and traditional security, and this puts Juniper in a unique position to solve customers' needs. Connect with Juniper during a virtual summit on May 14th, 2020. To learn more, visit summit.juniper.net. That's summit.juniper.net. And we thank Juniper Networks for sponsoring our show.

Dave Bittner: [00:01:19] Thanks to our sponsor, Enveil, whose revolutionary ZeroReveal solution protects data while it's being used or processed – the 'holy grail' of data encryption. Enveil delivers privacy-preserving capabilities to enable critical business functions. Organizations can securely derive insights, cross-match, and search third-party data assets without ever revealing the contents of the interaction or compromising the ownership of the underlying data. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.

Mayank Varia: [00:01:57] We started with a group of three of us at Boston University – professors Ari Trachtenberg, Ron Canetti, and myself.

Dave Bittner: [00:02:05] That's Mayank Varia. He's a research associate professor of computer science at Boston University. The research we're discussing today is titled, "The PACT Protocol Specification."

Mayank Varia: [00:02:19] Soon after the social distancing protocols were put into place, after all of the stay-at-home orders were put into place, we started thinking about what might happen in a future world where those orders eventually start to become lifted – subject, of course, to public health guidance – and people start moving about in the world again, and what might be done, as a society, from a health perspective, to try to reduce the risk of the spread of the coronavirus in this future world. And based on our reading, and talking to people in the healthcare space, epidemiologists, et cetera, it sounds like, from a healthcare perspective, there's sort of two categories of approaches to dealing with the spread of disease.

Mayank Varia: [00:03:06] One approach is a general quarantine, which is the state of the world right now – everybody stays at home. And the other is a targeted quarantine, where if you identify people who are susceptible, who have been diagnosed or likely to have coronavirus, you can ask them to self-quarantine, rather than the entire population. And one of the challenges with the coronavirus specifically is that the disease is something that you could potentially have and be able to transmit before you even realize, in the sense before you're symptomatic. And so the question is, how can you find a method to learn as early as possible whether you might be at higher risk of having acquired the coronavirus?

Mayank Varia: [00:03:52] And one approach that the medical community already uses is something called "contact tracing," which is that if somebody is diagnosed with coronavirus, then they, together with medical health professionals, will try to identify all of the people with whom they have come into close contact. And that is a very important process – a great thing that the medical community is doing. But just like everything else in the healthcare space nowadays, they're becoming stressed. They're hitting up to capacity with a lot of the people who are reporting being diagnosed with this disease. So, the question that we had that started off our work was, if there is a world where we want some kind of automated system that can supplement the existing manual contact-tracing efforts, what could an automated computerized system look like, and how could we use our own specific speciality of security and privacy research to ensure that if there is an automated contact-tracing system, it is as privacy-preserving as possible, and also has as strong of integrity as possible against the threats of people uploading spurious information? So, how can we ensure privacy and authenticity of any information in any type of automated contact-tracing system?

Dave Bittner: [00:05:19] Hmm. So, PACT stands for "Private Automated Contact Tracing." Let's dig into the protocol together. Again, start at the high level – what are you all proposing here?

Mayank Varia: [00:05:34] So, at a high level, what we're proposing is a system that would allow people, through their cell phones – or any other kind of electronic device that they may carry with them, like a wearable, et cetera – to identify, in a privacy-preserving manner, when they come into close contact with somebody else and their smartphone or wearable device or whatnot. So, to use – rather than tracking, say, absolute location – so rather than using GPS or something that would measure where everyone is actually moving around town, which is very sensitive data that's very privacy invasive and is kind of overkill for this question – instead, we just want to know if two people come into close contact, they want to somehow exchange some piece of information, some random number that they can use to identify that singular encounter.

Mayank Varia: [00:06:26] So, when two phones or two devices come into close contact, close proximity within the CDC and World Health Organization guidelines, they exchange a random number, a long random number that just identifies that single contact and has nothing to do with anything else in the world. It does not have to do with their names or phone numbers or any other kind of identifier. And it doesn't have to do with any future or prior encounter with other people. And the reason for exchanging these private random numbers is so that if one of these two people from this close contact – if one of the two people is later diagnosed with the coronavirus, then they can report in a private way to the other person that this event has occurred. So they can share this private number, and that will be an indicator to the other person in that contact event that I have now come into contact with somebody who has later been diagnosed with the coronavirus. And so then you can take the appropriate healthcare precautions, such as self-quarantining to see if symptoms develop, et cetera.

Dave Bittner: [00:07:35] So, on the technical side, you're proposing using low-power Bluetooth?

Mayank Varia: [00:07:42] Yes, that's right. So, the intent is to use any kind of short-band radio system that only operates over small distances. A popular example that is commonly available in a lot of different consumer electronics like smartphones is Bluetooth. Another potential use would be something like NFC, but there's some challenges with NFC in the sense that it's not always available for applications to use on all consumer devices, and also its range may be in fact too short. There's other signals like Wi-Fi radios or even cellular connections, but they have very long distances over which they communicate. And so, the goal is to try to find a ubiquitous radio that's already in a lot of consumer devices that operates at approximately the scale of the recommended guidance for what constitutes close proximity. Although the Bluetooth radios operate at a longer distance than the recommended guidances for recording close contacts, and so there's a lot of ongoing research by both our team and many others around the world to try to figure out how you can estimate whether the Bluetooth signal strength is large enough that it can constitute a close proximity – say, within two meters, as per many of the usual healthcare guidances – and for a sufficiently long period of time.

Dave Bittner: [00:09:07] Now, there are three main components here that we're dealing with. You've got your chirping layer, your tracing layer, and your interaction with medical professionals. Can we go through each of those one at a time and explain what's going on?

Mayank Varia: [00:09:22] Sure. So, at the first layer, when two devices come into close proximity, they send – each device will send each other one random number. Actually, your device, I should say, is sending out a random number all of the time. We call this random number a "chirp." It's just a temporary one-time-use number that you're sending out, your device is transmitting all the time. Because you don't know necessarily whether anybody else is within close proximity to hear it, so your device will send this out all the time. But the number will change. Every single time you send, it will be a different number, so as to prevent anybody from doing any kind of long-term tracking of your location. So it's not a persistent identifier for you across time and space. It's just a one time number, this chirp. And if somebody happens to be in close proximity to you, then they will record that information. And similarly, their device would also be sending you a different chirp or random number that your device would be recording. So that's the first stage, when two people come into close contact.

Dave Bittner: [00:10:28] And these numbers are being stored locally?

Mayank Varia: [00:10:32] These numbers are purely being stored locally. They're not transmitted anywhere. And in fact, we're working to make sure that that information is stored, protected, using encryption to be protected at rest, so that even your own device does not even have those numbers until the second stage occurs. So even you do not remember your own metadata until the second stage occurs, which is if one of the two people involved in that particular contact is later diagnosed with COVID-19, then they, together with their certified healthcare professionals, there's a process by which they can upload information to a publicly accessible database. It's a public database, so they're uploading information that is not personally sensitive to them. It's not like their name or anything like that. It's uploading information that would allow the other person from that interaction to realize that they have come into contact with someone who is diagnosed with COVID-19.

Dave Bittner: [00:11:35] Hmm. So, let's say I'm someone who's just been going about my business and have not been diagnosed with having COVID-19. My device is gathering – it's collecting chirps, it's generating chirps – and then so someone that I've crossed paths with has been verified as being infected. If all of my information is being stored on my phone, how is that information – that's being stored locally – how is that going to interact with the larger database for me to be notified?

Mayank Varia: [00:12:09] So, in terms of you being notified, let me describe sort of a few iterations of how that system might work. So I'll start with a simple example that is more or less the flavor of what we're going for, and then I'll add some extra features to get extra privacy and integrity protections. So, the simplest version of – the simplest way to understand how the system works is, all of these devices are just sending out these chirps, these completely random temporary numbers. And if you happen to come into contact with somebody who later is diagnosed with COVID-19 they, together with their medical professionals, could upload to a database the random number, the chirp that you had. So, they sent you a chirp in the past, which your device recorded – just locally, purely on your own device – and then when they later get diagnosed with COVID-19, they upload that chirp to the database, and then you can just download a copy of that entire database of all people's chirps of the diagnosed patients, and you can compare it against your own local database. So, this would allow you just to do a match by saying, OK, I just want to do an equality check of, does anybody's chirps from the uploaded database match with my local device?

Mayank Varia: [00:13:27] So that's, like, a way to do this check. But there are some potential concerns here. For instance, the size of this dataset is going to be very large, so just from the standpoint of downloading this, you know, like, everybody is chirping these random numbers all the time, and so if they have to send out a lot of them, that could potentially be an issue. Also, a potential actor, a bad actor might upload somebody else's chirps. So, somebody who you never came into close proximity with might actually try to upload the chirps of someone with whom you did come into close proximity, which would generate panic and false alarms.

Mayank Varia: [00:14:05] So, the actual system that we have in order to resolve these issues effectively uses a one-way function – like a cryptographic hash that can be computed only in the forward direction, but not the backward direction. And in order to – these chirps are actually generated as a cryptographic hash of some random number. So the person that you interacted with actually has a number that they choose for that interaction that is not the one that they send to you. They actually only send you a cryptographic hash of this number in their own phone. And then, if they later are diagnosed with COVID-19, they upload the pre-image to that, to the chirp. So they upload some – it's basically a proof that they were the person who generated the chirp in the first place, that only the phone and the device that actually generated the chirp has this information. So that sort of addresses some of these kinds of trolling attacks or scaring attacks where people might try to scare other people that they haven't actually come into contact with.

Mayank Varia: [00:15:11] And finally, we wanted to make sure that our algorithm, our protocol, the system that we created, this PACT system, provides people with full autonomy and choice for, if they are diagnosed with COVID-19, that they have full control and choice over what they choose to upload. So, for instance, maybe they have a particularly sensitive event that they're going to – something that they don't want any information ever recorded or any information ever chirped – so, the specification calls for a snooze operation to be built into the application so that somebody who is walking around and decides that they want to momentarily pause the system can do so. Additionally, even later, even after the fact, if they have sent out some chirps and then there's a particular subset that they choose they don't want to upload, they can choose not to upload some fraction of these chirps, these random numbers, and that procedure by which they choose what to upload and what not to upload also has privacy protections, in the sense of nobody else will know your decision as to, you know, how to protect your own healthcare information. So, how to protect the information about what you choose to disclose to others.

Dave Bittner: [00:16:27] To be clear, there is no location data being saved along with any of this?

Mayank Varia: [00:16:33] There is no location data being saved within the application itself. So, it means that people with whom you have never come into contact will have no idea what these random numbers correspond to. Now, the people with whom you have come into contact could potentially remember that I received a chirp, say, you and I came into close contact, so I could remember I came into contact with you at this stage at this time. Now, the application will not store that. The application is not going to remember any of that. It purposely does not have an interest in keeping this kind of metadata around, but potentially a bad actor who's trying to use this system in order to track people's movements – they might try to do that. So, a bad actor might try to remember the time and location and person associated with the particular chirp.

Mayank Varia: [00:17:24] Which is why we have several mechanisms to try to limit the ability for any of the information in the system – even by potentially malicious actors – to be used to re-identify any of your tracking movements. So if a bunch of people decide to try to get together and try to reconstruct your movement history, they will not be able to do so, because – well, so, first, for people who do not have COVID-19, who never upload anything to the database, they will not be able to track your movements, because you're just sending out totally random numbers at every single time – all of your chirps have nothing to do with any other chirp.

Mayank Varia: [00:18:01] If you are a person who is diagnosed with COVID-19 and upload something to the dataset, the situation is a little bit more complicated, because the mere fact that you're informing other people that you have been contracted with the coronavirus could potentially – just that one fact, independent of anything else about how the system works – tells them some information about who it is, right? So they know it's someone – that somebody with whom they've come into close contact has acquired the coronavirus. So, for instance, if you and I have been, you know, self-quarantining for fourteen days or more, and then, say, this particular interview was a face-to-face interview, like, this is our only time that we've ever come into close contact with anybody in the last several weeks. And then later on, you get an alert from any kind of system – automated, manual, whatever – any kind of system that says that you have come into close proximity with someone who has the coronavirus, then you would know that it was me. Not because of anything about the details of how the system works or doesn't work, but because the actual system itself – the actual concept of providing this idea of contact tracing itself – could potentially reveal information, especially if you have not come into contact with many people. You can infer who it might have been.

Mayank Varia: [00:19:23] And that's an important thing, a message I want to get across, and it influences the decisions on how and when to deploy any kind of contact-tracing system, and the scope for which we may want any of this kind of technology to be used. So, just like any other kind of technology, it's important to design the system with an eye for protecting against mission creep, to make sure that it's actually targeted towards solving an important current epidemic. And then afterward, the system should gracefully go away. Which, in some sense, the system we've built has this nice, graceful degradation property that – the system that, I should say, we've built and many other academics have proposed very similar systems to ours – they have a very nice, graceful degradation property in the sense that no information is provided about anything related to any kind of contacts or interactions unless or until somebody uploads to this dataset that the chirps associated with them having contracted with the information that they've sent for people who've been later contracted the coronavirus. Which means at a future world – hopefully soon in the future world – in which the coronavirus epidemic is behind us, in which the disease has gone away, then in that world, there would be no information ever uploaded to the database ever again, and then the system would naturally no longer be revealing anything about anybody's movements.

Mayank Varia: [00:20:52] So, there is information that is revealed to the contacts of people who've contracted coronavirus – namely, the mere fact that you have come into contact with someone who has the coronavirus. And that information could potentially be something that will somewhat reduce the healthcare privacy of people who have the coronavirus, but only to their immediate contacts, only to the people that they've come into close proximity with, and not to the general public, not to any kind of remote system that wants to try to, like, learn this in aggregate in large scale for the entire population – just to the contacts with whom you've come in to close proximity.

Dave Bittner: [00:21:36] Help me understand, because it seems to me like in terms of fighting a disease like this, for the folks who are trying to track its progress, how it makes its way through populations, location data is going to be very helpful for them. So, is a system like this relying on sort of a secondary location-reporting system, where presumably if I were to find out that I'd been in contact with someone, my next step would be to speak with my doctor and then perhaps my doctor would be the one to report to the powers that be that, hey, we have a case here and here's where it is.

Mayank Varia: [00:22:15] Yes, that's right. So two sort of comments here. First of all, within our system, the actual event of uploading these chirps to this public registry is something that you have to do in concert with a certified health authority, and in the PACT specification, we go into some level of detail as to how to ensure that only information that is certified by a public health authority ever gets uploaded to this dataset. Which is mostly to try to restrict the ability of the system to be used for these kinds of trolling or scaring attacks where people upload spurious information.

Mayank Varia: [00:22:56] But I think your question also gets to an even more important point, which is that this system is meant not to be some kind of substitute or replacement or technology replacing anything about the existing healthcare system. It's only meant to solve a small part of the response to the coronavirus epidemic, which is to help certified health professionals to do the impressive work that they're already doing in terms of helping individual patients to, you know, to treat them, and to help get a better understanding of the spread of the disease. So this is not meant to replace any aspect of what's currently happening. It's meant to provide more information to you as a person who's maybe concerned about whether you've come into contact with someone with the coronavirus, and to your healthcare professionals, to your personal physician, to make more informed decisions about your own healthcare. So, it needs to work in conjunction with the way that everything else, the rest of the – both the personal healthcare responses, like your personal physician, and public health response at a nation level or state level, et cetera.

Dave Bittner: [00:24:07] Now, we've seen quite a bit of attention in the media lately that Apple and Google announced a collaborative effort of doing something that to me sounds similar to what your efforts are here. Is that indeed the case? And how do you see these varying systems that have been proposed sort of meeting together to have – you know, I guess the ideal situation would be to have one standard system that can interoperate?

Mayank Varia: [00:24:35] Yes, I agree with you that that is the ideal goal here. So, yeah, to answer your question about the recent news about Apple and Google's work, their joint endeavor in the space, it's very encouraging news and their specification – at least at the level of the information that is currently publicly available – their specification is largely very similar to our approach for the PACT team, and to the approaches of many other researchers throughout the world who have proposed very similar systems. There are some very small, slight differences between our system and some of the other researchers' systems and the Apple-Google thing. They are largely trying to solve the same problem of automated contact tracing, and have largely the same privacy and integrity guarantees. 

Mayank Varia: [00:25:25] Which is very encouraging. So, when we first started this project back three or four weeks ago, our intent at the time was, you know, let's try to build, like, let's try to design the strongest possible privacy protections that we can into a system that's also incredibly simple to understand – so that it's easy for the world to understand the kind of privacy protections it provides – and very easy to build, because we want to build on such short timeframe. So, you know, that was our initial focus, and so we were focused on sort of ease of development and deployment and understanding. And now with this news that Apple and Google are working to build the same thing, I agree with you that the goal should be not to – you know, the last thing we want to do is to confuse or fracture the base of people who, you know, might just download different applications that are not compatible with each other. That would be a bad idea. It would not achieve the public health outcomes that we want if we walk past each other and I'm chirping using protocol A, but you're chirping using protocol B, and we don't understand each other or we're looking at different data in different databases. That would be a problem.

Mayank Varia: [00:26:36] So, at this point, we're looking to see what is it that we can do to assist the ongoing efforts by others, you know, technology companies and other researchers and governments in other countries who are all looking at sort of building and deploying these applications. So we're looking to see how we can provide even stronger systems that provide even greater security and privacy and integrity goals. So maybe not for the initial version, because the initial versions, we wanted the protocol to be so simple that it was easy to build in short order to address the current epidemic. But sort of now, we're thinking sort of looking ahead, what is it that we can do to provide even stronger assurances?

Mayank Varia: [00:27:13] Furthermore, we're looking at, can we build a prototype application of the same kind of thing as the Apples and Googles and other countries' governments are building, so that we can understand what are the potential pitfalls that occur when the rubber meets the road, when you actually implement this thing? And how does the specification hold up when it's running on a complicated device like a smartphone, which has other sensors, which has other programs running at the same time? So we're trying to make sure we can try to understand that as soon as possible so as to provide guidance to the teams that are building this out for production to provide as soon as possible information to them about the kinds of concerns that they should think about when implementing. Because if, you know, one of the lessons we've learned over and over again in the security and privacy community is you can have an idea that looks all well and good on paper, but until and unless you implement it, you don't know necessarily where there's some subtle issue that could go wrong – some kind of side-channel attack, some kind of potential implementation flaw to be wary of, et cetera. We think that our best role going forward is to try to see proactively what are the kinds of issues that other folks might run into as they build and deploy and maintain these kinds of systems, so that we can provide actionable guidance to them.

Dave Bittner: [00:28:34] Do you have any sense for what kind of timeline you're on in terms of testing this and making it available?

Mayank Varia: [00:28:42] So, I think that there are already research prototypes of this software available by – both our team is actively working on this, on building a prototype right now, and I think there are other research teams around the world that already have some open-source software deployments. It's sort of our view that having one common spec for the world is a useful goal to move toward for the reasons that you were asking previously – for interoperability reasons. But there can be value in software diversity – of different implementations of this spec, not necessarily all deployed in practice, but all ready to be deployed in practice. In case there is any issue identified with one, it's good for all of us to be trying to build independent implementations so that we can understand better where this could go wrong, because of the tight timeline constraints to get this idea out there, you know, it's sort of better to maybe – given the state of the coronavirus epidemic – it can be better for us as a society to be sort of, you know, quote unquote, "wasting time" in parallel – by having many people do work in parallel in order to save time in sequence, to get the ideas out there as soon as possible. That's our perspective.

Mayank Varia: [00:29:58] I think that based on my current understanding of how other technology companies and academic research teams are moving forward, there are active field trials – like, small-scale experiments of using this technology in use today and starting to be spun up around the world. And my understanding as to the technology companies like Apple and Google is that maybe they may have something built into either their operating systems or an application that they have built for download on their app stores, maybe within a timeframe of, say, a month or so to build and test that.

Dave Bittner: [00:30:39] That's my Mayank Varia from Boston University. The research is titled, "The PACT Protocol Specification." We'll have a link in the show notes. 

Dave Bittner: [00:30:48] Thanks to Juniper Networks for sponsoring our show. You can learn more at juniper.net/security, or connect with them on Twitter or Facebook.

Dave Bittner: [00:30:56] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security at enveil.com. 

Dave Bittner: [00:31:04] The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technologies. Our amazing. CyberWire team working from home is Elliott Peltzman, Puru Prakash, Stefan Vaziri, Kelsea Bond, Tim Nodar, Joe Carrigan, Carol Theriault, Ben Yelin, Nick Veliky, Gina Johnson, Bennett Moe, Chris Russell, John Petrik, Jennifer Eiben, Rick Howard, Peter Kilpe, and I'm Dave Bittner. Thanks for listening.