Research Saturday 4.13.19
Ep 81 | 4.13.19

Establishing software root of trust unconditionally.

Transcript

Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday, presented by Juniper Networks. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us

Dave Bittner: [00:00:26] And now a word about our sponsor, Juniper Networks. Organizations are constantly evolving and increasingly turning to multicloud to transform IT. Juniper's connected security gives organizations the ability to safeguard users, applications, and infrastructure by extending security to all points of connection across the network. Helping defend you against advanced threats, Juniper's connected security is also open, so you can build on the security solutions and infrastructure you already have. Secure your entire business, from your endpoints to your edge, and every cloud in between with Juniper's connected security. Connect with Juniper on Twitter or Facebook. And we thank Juniper for making it possible to bring you Research Saturday

Dave Bittner: [00:01:13] And thanks also to our sponsor, Enveil, whose revolutionary ZeroReveal solution closes the last gap in data security: protecting data in use. It's the industry's first and only scalable commercial solution enabling data to remain encrypted throughout the entire processing lifecycle. Imagine being able to analyze, search, and perform calculations on sensitive data - all without ever decrypting anything. All without the risks of theft or inadvertent exposure. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.

Virgil Gligor: [00:01:53] So, for the past fifteen years or so, in particular, most importantly, since about 2008 onwards, we noticed that malware - that's malicious software - got placed into the firmware of device controllers.

Dave Bittner: [00:02:11] That's Virgil Gligor. He's a professor at Carnegie Mellon University and a member of their CyLab Security and Privacy Institute. The research we're discussing today is titled, "Establishing Software Root of Trust Unconditionally."

Virgil Gligor: [00:02:23] That includes network interface cards inside your laptop or your desktop. It includes DMA devices - that's direct memory access devices. It includes desk control apps. It includes systems management boards. And this malware - which started being discovered around 2008 - got to be fairly difficult to spot. In fact, that kind of malware could not be detected by any antivirus or anti-malware program that runs on your machine. And the reason for that is very simple. The antivirus malware that runs on your machine has to communicate with these peripheral device controllers themselves.

Virgil Gligor: [00:03:15] So, in effect, antivirus malware programs have to communicate with malware. And malware - as it was pointed out about 2015 by the head of the global research and analysis team at Kaspersky, Costin Raiu - malware can always reply positively, namely, "the firmware update was done completely, no problem." And the like. So it's extremely difficult to detect the presence of malware on these devices. And in fact, in 2015, this fellow at Kaspersky suggested that we needed a reliable test to detect such malware on the firmware of the peripheral devices. And such test did not exist.

Dave Bittner: [00:04:01] Hmm.

Virgil Gligor: [00:04:01] So what Kaspersky pointed out was quite clear to us at Carnegie Mellon for quite some time. In fact, there was research done here by some of my former colleagues, as of 2010, 2011, which identified this problem. So, essentially, we are keenly aware of the problem. In fact, the US government was also keenly aware about it. So, the problem became worse and worse over time, as opposed to better and better.

Dave Bittner: [00:04:31] Hmm.

Virgil Gligor: [00:04:31] And the reason for that is very simple. The placement of malware on the firmware of these peripheral devices became more pervasive. That is, it became a problem of supply chain, among other attack vectors. So, malware can come to you, an end user, on your device shrink-wrapped. So, now, in the supply chain, there are multiple points where this malware could be placed, and people identified over time these vectors of placement of malware on these devices in the supply chain. The supply chain is just one example to show that, in fact, it's very easy for experts - not for the mere mortals like us - but it's very easy for experts to actually introduce this malware once they have control of a supply chain.

Dave Bittner: [00:05:24] So, what we're talking about here is establishment of this thing called "root of trust." Take us through - what does that mean?

Virgil Gligor: [00:05:32] So, that means that the person who wants to carry out the malware-detection test, and malware-replacement test, has to attach an external device. And this external device, which we call a "verifier," initializes the firmware of the peripheral device controllers and the memory of your computer - the volatile memory, the primary memory, not the disk. So, essentially, if the verifier can initialize verifiably the firmware of these devices, then, clearly, malware disappears. The problem is that it's very difficult to figure out that in fact the initialization of the firmware was done correctly.

Dave Bittner: [00:06:17] Hmm.

Virgil Gligor: [00:06:18] So, root of trust essentially has two phases. One, initialize all your flashable firmware of your peripheral device controllers and initialize your memory, and then test that the initialization was done correctly. If it's done correctly, again, then malware clearly, by definition, disappears, because your initialization does not contain malware. And unfortunately, this test cannot be done correctly with very high confidence, then you don't know whether the malware disappeared.

Virgil Gligor: [00:06:52] So, essentially, the test that we produced is the test that unconditionally tells the verifier that in fact everything was initialized correctly, and there is no malware on the computer - on the system state. This is, by the way, before your computer boots.

Dave Bittner: [00:07:10] Ah, I see. And that's really the trick here, is this notion that it's unconditional?

Virgil Gligor: [00:07:14] Correct. And it happens before you're booting the operating system and you install the operating system. Now, "unconditionality" here means that the test requires no secrets, it requires no special trusted hardware modules, like Trusted Platform Modules or Secure Guard Extensions from Intel and others, and high-security modules - so, no trusted hardware modules - and no bounds on the adversary computing power. So, this notion of unconditionality here is extremely strong. It hasn't been encountered in security before this.

Dave Bittner: [00:07:54] All right, well, let's dig in here. Describe to us, I guess as much as you can put it in layman's terms, how are you achieving this?

Virgil Gligor: [00:08:02] Essentially, what I'm doing here is initializing the device controllers and the primary memory with a particular computation. And this computation, unlike many others that were started in the past, is optimal in space and time, meaning it cannot take less than a particular number of words in a particular memory, or less than a number of time units, and it will take no more than the particular number of words or the time units. So, "optimality" means that your lower bounds equal your upper bounds.

Virgil Gligor: [00:08:42] So, if you can find such a test, where optimality is concrete, meaning it's specified in terms of real quantities - number of words, units of time, process of cycles, that is - then you can count that once the computation executed, there is nothing else in that memory that could execute faster, and in the processor registers and in the processor memory. So, essentially, it's this notion of complete optimality that enabled us to do this test. This notion of concrete optimality did not exist in theory, in computational complexity. In fact, all the notion of optimality people had were asymptotic, which could not be used.

Dave Bittner: [00:09:31] Walk us through what happens when you boot up a system or prepare to boot up a system that would be using your method.

Virgil Gligor: [00:09:39] OK, so when you boot up a system, obviously there is a certain amount of code which runs in the system which you cannot trust. So you really don't know that your bootloader is trusted, because it may not be. So essentially, you boot your computer, you have your bootloader. Then the bootloader responds to verifier commands. It has to respond, because otherwise the verifier detects right away that there is a problem. So it responds to the verifier commands. The verifier asks the bootloader to initialize the memory of your system - the primary memory of your system - and to initialize the memories, or to re-flash the memories of the device controllers.

Virgil Gligor: [00:10:24] So, in fact, you notice that the bootloader does no longer talk to the disk itself. It only talks to the disk controller, for example. So, this initialization which is performed is performed with these computations that I just described, which are space-time optimal. And once the bootloader completes the initialization, it responds to the verifier and says, look, I'm done. What do I need to do next? Then the verifier challenges these computations, which were initialized already, to run in the particular number of words which were initialized, plus in the amount of time which is the limit - the lower bound - for the computation.

Virgil Gligor: [00:11:10] And if the results come back correctly of the computation and in the specified times, the verifier can conclude that there is no malware in the system - that in fact, the system state, the memories, contain only the values which were initialized. At that point, once the verifier concludes that, the verifier can actually start a boot process under which software - trustworthy software - is loaded on the machine and the boot of the operating system off the disk can complete. So, roughly, these are the steps that one goes through in this test. And please remember that this test is done before this system runs. In other words, it cannot be done in the middle of a computation, for example, on your system. Consequently, it's not done all that frequently.

Virgil Gligor: [00:12:03] Now, it is entirely possible that between two such tests, malware is placed on your machine surreptitiously. Well, the second time you do that test, you are able to unconditionally get rid of that malware.

Dave Bittner: [00:12:18] So, when the system prepares to reboot, let's say, that's when the test will happen, and the malware that has been installed in the meantime will be detected.

Virgil Gligor: [00:12:27] Correct. That's exactly what happens. Now, the reason why this is called "root-of-trust establishment," because essentially the root of trust in the system is really comprises the contents - the chosen contents - of the system state. And the system state is basically the content of the memories that we have talking about, and processor registers.

Dave Bittner: [00:12:53] So, help me understand - how do you establish your baseline. How do you establish that - when you're doing your initial testing - that the system is clean to begin with?

Virgil Gligor: [00:13:02] Well, so the first question is how do we establish that the results that came back from the test were correct? How can that verifier tell that the results are correct?

Dave Bittner: [00:13:12] Hmm.

Virgil Gligor: [00:13:12] So, that's the first question. And that turns out not to be a major problem in the following sense: the verifier has a specification of the machine. So, whoever built the verifier and constructed the test has a specification of the system type on that test. And therefore, the verifier can obtain the correct results, which are run separately, either on a simulator of the machine or on a machine - a copy of the machine that does not contain malware. So, either way would work. So essentially, the verifier has the right results in hands. Both, as they say, both in terms of the computation result and the timing. So that's essentially what's necessary for the test to succeed.

Dave Bittner: [00:13:59] So, what happens - you know, I can imagine during the normal lifecycle of a system, that changes are made. Hardware can be added or taken away or updated, firmware could be updated, and so on...

Virgil Gligor: [00:14:11] Yes.

Dave Bittner: [00:14:11] ...How do you then re-establish that those changes made along the way were for good and not evil?

Virgil Gligor: [00:14:18] Yes. So, what this test refers to is this extremely difficult problem of detecting malware in the - in detecting unknown content, unaccountable-for content in the firmware. We are not so much interested in the higher levels, what happens with an operating system which is buggy? We are interested in this very difficult area.

Virgil Gligor: [00:14:41] As you point out, malware in these peripheral device controllers and these firmware can be inserted at all points during the system lifetime because of updates. And these updates could come from companies like, for example, ASUS in Taiwan. As you may have noticed two days ago, they update their systems and those updates might as well very well contain updates to the firmware.

Dave Bittner: [00:15:09] Hmm.

Virgil Gligor: [00:15:09] Which is clearly the case with supply-chain updates. So, essentially, the scenario that they posted is absolutely credible and practical. Essentially what happens is, after such updates, you have to bring your system down and perform this external test. Now, this is obviously not trivial at this point, but it's a necessary step to detect that you're firmware updates were done completely and correctly - that in fact, no malware, no unaccounted-for content was placed in your firmware.

Virgil Gligor: [00:15:47] And by the way, when I say "unaccounted-for content," what I mean is that, often, the firmware in the device controller is not fully utilized. There are sections of the firmware that may contain code which is not updated - that, for example, reformats partitions of the disk, let's say. And this is a huge problem. You have to actually re-flash, retest, the entire firmware, and not leave out any hidden aspect of any hidden part of the firmware.

Virgil Gligor: [00:16:22] And that of course brings us to the notion - do you, the tester, do you, the verifier, have the complete and correct specifications of your peripheral device controllers? And again, without complete and correct specification the test could not be done.

Dave Bittner: [00:16:38] Hmm.

Virgil Gligor: [00:16:39] The test, by the way, that has to be done upon all supply chain updates, or all updates carried out by the operating system, relies or depends on two things, fundamentally. One is correct device specifications. Secondly, randomness in nature. In other words, we have to be able to collect random numbers - true random numbers, from nature. Not pseudo-random numbers, but true random numbers. Pseudo-random numbers, again, assume that your adversary is bounded - their power is bounded. We don't assume that. So, the verifier has to have correct device specifications so that there's true random numbers, and of course it has to have the correct results in-hand before the test is started. With that, the test can be carried out, at least in principle, unconditionally.

Dave Bittner: [00:17:26] I suppose, you know, one of the elements here is that you have to trust your supplier that the specifications they're giving you are accurate...

Virgil Gligor: [00:17:34] Correct.

Dave Bittner: [00:17:35] ...Could your system be useful in detecting if those specifications don't align with what is actually being delivered?

Virgil Gligor: [00:17:43] My first gut reaction is to say no.

Dave Bittner: [00:17:46] OK.

Virgil Gligor: [00:17:46] My system - my test is not oriented towards that. It does depend on correct specifications. And let me say at the outset, is that that is generally a huge problem.

Dave Bittner: [00:18:00] Hmm.

Virgil Gligor: [00:18:00] And it's a huge problem not for my test - it's a huge problem for computing in general, in practice. It's a huge problem for reliability. No reliability problem - if we forget about security - can be solved without complete and correct specification. No security problem can be solved without complete and correct specification, and no cryptography problem can be solved without correct and complete specification, just to mention a few areas.

Dave Bittner: [00:18:27] Yeah.

Virgil Gligor: [00:18:27] In other words, you have know the specification of your devices.

Dave Bittner: [00:18:31] I see.

Virgil Gligor: [00:18:31] And by the way, that's not a condition. (Laughs) It's a fact of life.

Dave Bittner: [00:18:35] Right. (Laughs) I see. The work that you're doing here. My understanding is that this is still in the theoretical stage - is that correct?

Virgil Gligor: [00:18:42] Correct. This is - so far has been in the theoretical stage, but by the way, we've had quite a bit of experience with these type of tests. For example, in 2015, we published a paper identifying the fiction that was relevant, or prevalent, in previous tests which people have tried, mostly here at Carnegie Mellon, where we thought about this problem for a long time. So, we took an introspective look at root-of-trust establishment, and we realized that there was a lot of fiction. So, we have a lot of experience with tests in practice, but not of the kind that I came up with recently, and published recently.

Dave Bittner: [00:19:25] I see. So what's next? How does this go from the theoretical and be turned loose in the real world?

Virgil Gligor: [00:19:31] The next step would be to implement it on real devices, and there is a variety of devices this could be used, by the way. Microcontrollers that control medical devices. Microcontrollers for embedded real-time systems, such as, for example, weapons systems, if one cares about defense. So, all sorts of microcontrollers, which are relatively simple devices, in the sense that their specifications are extremely well-known. Laptops are complicated devices. Their specifications - of some of the devices - are known only to the device producers and to government agencies that take the time to discover those specifications. And I don't mean government organizations only in the United States, I mean government organizations around the world.

Virgil Gligor: [00:20:21] So, the next step is the first small step - handle the problem for device controllers that have updateable firmware, and make sure that that you can show that there is no possible malware - which we can with our test - and then move on to more complex devices. Devices with multiple processors, devices with all sorts of features, which we can handle - at least theoretically - like caches, pipelining virtual memory, SIMD operations. Those are the steps that we anticipate. Nevertheless, the first hurdle was passed - namely, showing that this is indeed possible, was passed. Because it wasn't clear that this test really was possible. Everything else failed before, so...

Dave Bittner: [00:21:06] Well, congratulations to you and your collaborators. It seems as though you may be onto something important here.

Virgil Gligor: [00:21:13] Thank you very much. At least from an intellectual point of view, this was an achievement. But how fast and how soon we can actually materialize this in practice remains to be seen.

Dave Bittner: [00:21:28] Our thanks to Virgil Gligor from Carnegie Mellon CyLab Security and Privacy Institute for joining us. The research is titled, "Establishing Software Root of Trust Unconditionally." We'll have a link in the show notes.

Dave Bittner: [00:21:42] Thanks to Juniper Networks for sponsoring our show. You can learn more at juniper.net/security, or connect with them on Twitter or Facebook.

Dave Bittner: [00:21:51] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security enveil.com.

Dave Bittner: [00:22:00] The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technology. The coordinating producer is Jennifer Eiben, editor is John Petrik, technical editor is Chris Russell, executive editor is Peter Kilpe, and I'm Dave Bittner. Thanks for listening.