
Root access to the great firewall.
Dave Bittner: Hello, everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems, and protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Daniel Schwalbe: Basically, a data leak happened in September of this year, so a few months ago, which was an unprecedented amount of very specific details on how the Great Firewall actually works. And of course, when such a data leak is exposed to the public, it's always worth having a look. And my research team was kind of chomping at the bit. They're like, can we look at this? And we said, yeah, of course, with all the prerequisite precautions. Sometimes these dumps might be booby-trapped or otherwise. But this appeared to be genuine and a treasure trove of information about something that generally has been kept very secret.
Dave Bittner: That's Daniel Schwalbe, Head of Investigations and CISO at DomainTools. The research we're discussing today is titled "Inside the Great Firewall." [ Music ]
Daniel Schwalbe: There's not a whole lot of public information about the Great Firewall and how it does its things. A lot of research has been done just trying to empirically figuring it out. But in this particular situation, the, you know, over 500 gigabytes of internal data about the infrastructure and how it's organized, etcetera, was relief and we dug into the data in order to write about it.
Dave Bittner: Well, can you give us some insights on how you start digging into a data set that is that large? How do you go about it?
Daniel Schwalbe: Yeah, that's can certainly be overwhelming. We, you know, first took a high-level look at like, okay, what files are there? They were like, you know, diagrams and text specifications. So you cluster those into kind of one category. And then, you know, whenever there were, you know, particular outlines of like human interaction, like this is who controls it, etcetera, you put them in a different bucket, and then you start, you know, going through them. You will have to do a little bit of keyword searching. We intentionally didn't use any like LLM tools because we didn't want to further proliferate information. But we have some of our own tools we can, you know, feed information to and then do a quick analysis to kind of hone in on what are the sort of large chunks, the human part of it, the technical design, and then potential, what that could actually mean in terms of the real world.
Dave Bittner: Well, let's dig in here. From your research, how do you describe the overall architecture of the Great Firewall?
Daniel Schwalbe: I'm actually, this might be controversial, but I'm actually quite impressed. I used to work for an organization that basically, while it wasn't an ISP, ran a carrier-grade network. So I've struggled with how to do, you know, security on a 100-gigabit link, and that is no small feat. Granted, my experience was probably about 10 or so years ago, so technology has come a long way. But even back then, trying to do any kind of like just anti-malware inspection real-time in the traffic was a huge undertaking and cost, you know, millions and millions of dollars in equipment to be able to do that properly. Now, in a state-sponsored situation, such as in China, the funding is less of an issue, but the sheer scale of the infrastructure is quite impressive. The fact that they figured out how to build this, if you will, digital wall that any connection, you know, sourced from the mainland in China has to go through, and then, but also to map it out where there is central control. Which of course is important if you want to, you know, block certain types of information from leaving the country. So you have to have a central point of command and control, but then it can still be spread out to regions, and it gives regional governments some level of insight and blocking ability as well. And the fact that they managed to design this at scale that large, and it's fairly effective, I'm actually quite impressed.
Dave Bittner: Yeah, the report talks about things like the traffic secure gateway and Deep Packet Inspection. Can we dig into some of those details and how they work?
Daniel Schwalbe: Yes. A lot of the technologies being used is what's been used as, you know, regular cybersecurity best practices for years. So basically, what Deep Packet Inspection means is -- the way the internet is designed, when information is transmitted from one point to another, it gets chunked into little datagrams called "packets." And the idea is if one or two of them get lost on the way, you can either ask for a retransmission or it's not important because you can make it up from the context. So that gives these small packets that travel over the internet that contain the information. Now, Deep Packet Inspection essentially means in real time, you intercept this particular packet, you peek inside and glean what information might be included inside, whether there's like a malware hash or, you know, is it particular destinations and sources that are talking to each other, it's all very interesting. But doing that at speed to not slow the internet down significantly where somebody might, you know, become suspicious, or if it's a customer complaint, like why is my connection so slow? Doing that at scale is important, but it gives you an idea what two points on the internet are talking about to each other. Of course, there's things like encryption that makes this a lot harder, but there are other techniques you can use in order to get an idea of what -- even if the HTTP connection is encrypted, you can still get an idea of what the source is trying to reach on the outside internet and make blocking decisions based on that.
Dave Bittner: Yeah. In the research, you all talk about this notion of fingerprinting encrypted traffic. Unpack that for us. You talk about invisible identifiers. Why does that matter?
Daniel Schwalbe: Yeah. So of course, privacy on the internet is certainly important to me, and I think a lot of people start caring about that a lot more as of late. And so basically, the internet wasn't really designed with encryption in mind. The early days, everything was transmitted in clear text, so you wouldn't really be concerned about somebody maliciously intercepting your traffic to see what was going on. Well, later on, we added some of those layers, and one of them that's very popular is the secure HTTP protocol, HTTPS, which uses TLS encryption at transport layer security. Basically, you connect with a web server, you exchange pieces of information, and an encrypted tunnel between your computer and the web server is created where all the data with that particular site is being exchanged, but outside observers would not be able to tell who it is that you're talking to and what information you're exchanging. Extracting the information inside the encrypted tunnel is much harder because the cryptography is pretty strong, and so doing that on the fly is still not trivial. There are entities around this world that probably can do it, but at scale is, you know, very difficult. So what you still want to know is who might a particular user on your network that you might be concerned with, you know, or have other thoughts about, you want to know who they're talking to on the outside. Part of what TLS encryption, the protocol introduced, is the ability to obfuscate what virtual server you're talking about. And what I mean by that is, on the internet, you might have a web server that has an IP address, but it could answer for multiple domains. So for example, you know, we have domaintools.com, but it could also answer for something like domaintools.net, etcetera. So from a strictly network connection, all you're seeing is this computer reached out to this IP address, but we don't know what the domain that is loading might be associated with that. And so there are techniques. You can do a de-obfuscation essentially by fingerprinting certain sites, by looking at the data that the browser sends, etcetera, you might be able to glean information off of one specific website out of the potential, you know, dozens that could be present on a particular IP address, what that website is, which then gives you a good idea of what might this particular user be up to.
Dave Bittner: So we're talking about looking at metadata then?
Daniel Schwalbe: Yes.
Dave Bittner: Got it. Now, one of the things that your research highlights is that this is not a static thing, that this system has adaptive capabilities. Can you explain that to us?
Daniel Schwalbe: Yes. I mean, anything at that size and scale has to be modular. You can't rely on, you know, basically a single technology here. If there's a failure or something, then, you know, the whole internet goes down for a particular country. That wouldn't be very practical. I mean, for better or worse, the internet drives commerce around the world. And as we've seen here in the States from recent cloud provider outages, if one of them goes offline for a few hours a day, a large part of the population is having a bad day. So the ability to sustain a functioning internet is highest priority. So fault tolerance to a degree has to be there. And so the way that it seems to be designed based on the information in the dump is that the modularization of it means that certain parts could potentially be, you know, instructed to take out one action, where another part is completely unaffected. Or if there might be a regional, you know, protest movement or something, the administration of that particular region could say, we're going to block any and all, you know, mentioning of the following keywords, etcetera. But that might not necessarily be applied globally to the entire thing. So a different part of the country might not even be aware this is happening because otherwise that might give an idea. You would want to control information specifically within the country from one point to the other. You also have to be concerned, what do entities within your network talk to each other, hey, something's going going on over here. And by having this modular design that's pushed pretty far to the edge, you know, down to the regional government, and the ability to affect blocking there is very central to the strategy that they're employing.
Dave Bittner: We'll be right back. Another thing the investigation mentions is you all refer to it as a state industrial censorship complex with vendors and telecom carriers and regional nodes and central policy hubs. What part do these various folks play and how significant is that for the maintenance and evolution of the system?
Daniel Schwalbe: Yeah, it's an excellent question. From what we can glean from the dump from the data is that basically any entity that provides internet access to end users within the country is by hook or crook conscripted into helping this effort. Like there's no opting out. You want to do business in China as an internet service provider, you agree to participate in this scheme. That's the only way it works. Same thing is with mobile providers. You know, they're still, in a way, internet service providers, even though they provide telephony as well. But basically, that's a large part of the population accesses the internet from mobile devices. So wherever that gets routed before it hits, you know, the open internet, it has to be in there as well. And so internet service providers play a key role. Manufacturers of hardware that helps, you know, to route the internet, you know, transmit the traffic, etcetera, those all ideally have to be optimized for that purpose. And there are a number of manufacturers in the country that, it appears to be based on the information that was leaked, are actively cooperating and building hardware specifically that is beneficial to the type of network inspection at high rates that is needed to sustain this operation. So now we've got internet service providers, we've got hardware manufacturers, you know, various different entities that are in the chain of bringing internet access to an end user wherever it may go. And because of the power of the state -- and you're not going to do business in China without explicit approval of the state apparatus -- they can exercise this control over the various pieces in order to make this all work. If we were to try to do something even remotely close, let's say in the United States, because ISPs are independent entities, it would be very difficult to compel them to do so. Same with hardware manufacturers. They all have regular customers who probably object vigorously to a hardware manufacturer basically building in a better way to sniff the traffic. It's been attempted various different ways, but unless you have the full control end-to-end over the infrastructure, it would be almost impossible to pull off. But based on the information in the dump, it sure appears like they've done a pretty good job at getting that all working.
Dave Bittner: So help me understand here, are there global providers of these sorts of things, hardware, services, that are -- are they making custom versions for the Chinese market?
Daniel Schwalbe: So to my knowledge, it's focused on the actual Chinese manufacturers. You know, Huawei is one of them, of course, that's been in the news off and on over the years, but there's several others. I don't believe that there are, you know, outside China-based manufacturers that do very specific modifications for the country. In order to be able to sell there, you may need to take some notes from the regime, but there's also a lot of companies who just simply opt to not, you know, sell in the market because they don't want to be forced to introduce, you know, potential backdoors or additional hardware in things. That's not to say that you couldn't buy particular hardware on the open market and then modify it for your own purposes after the fact. But at scale, it would have basically require a manufacturer to cooperate. And there's enough of the technology and know-how within the country that they can lean on their domestic manufacturers pretty strongly without having to involve, you know, foreign companies.
Dave Bittner: Well, given this information, how does this affect countermeasures, things like VPNs or proxies or those sorts of circumvention tools? Do they work?
Daniel Schwalbe: Yes, yes and no. It certainly used to be much more of a cat-and-mouse game where -- because, you know, anything that large, there's going to be potential small loopholes or, you know, flaws in the design that you can exploit given enough time. And so certain VPNs, a certain way of tunneling, etcetera, has been possible. And if it gets detected and figured out how it's done, then it gets blocked. So you kind of keep moving. However, the specific technical details that were released in this data dump will actually give, you know, individuals or entities who want to enable more unfiltered access for, you know, people in the country, they might be able to use that to do even better job at circumventing things. Because the specific technical details of how VPNs are detected, how a certain activity or patterns are detected that then cause downstream blocking or being flagged for further review or something, that's been made public in the dump and could absolutely be used as a blueprint on how to do a better job circumventing. We haven't seen much of that yet, but it's only been a couple of months, so I suspect it's coming.
Dave Bittner: Suppose I'm on an enterprise security team or maybe a global threat intelligence team. Is there anything in this data dump that helps inform how I might work with or monitor traffic from China?
Daniel Schwalbe: Yes, I would definitely think so. You know, it depends on the level of sophistication of the entity and also their, you know, threat model. But there's enough technical information in there that would give you a pretty good idea, especially if you're, you know, seeing web connections coming from mainland China, what those look like. They're all going through the Great Firewall. So it gives you a better idea about, is something going through the firewall or did somebody find a temporary way to basically circumvent it or get around it? Because the pattern and the fingerprint of stuff that's coming in are likely just slightly different enough that if, with this additional information of what to look for, you might be able to tell the one activity from the other.
Dave Bittner: You mentioned at the outset that you were impressed by what you saw in this information. How so? How did it surprise you?
Daniel Schwalbe: Just the sheer scale. I mean, we knew the thing existed and there's been, you know, some research, external research, that's been done on it just by probing the various defenses, etcetera. There was never any specific information. Everything was basically assumptions based on observations, etcetera. But to actually have the documentation that appears to be legitimate, that's important to say, to have the documentation and see things like, yep, I thought this is how they were going to do this. Oh no, this is completely different than, you know, maybe I would have thought. I'm not a network engineer, so I'm not saying like my design would have been, you know, the world's greatest. But I've been doing this for 25 years, I've seen enough designs where I'm like, yeah, the faster the traffic, the bigger the bandwidth, the much more challenging this becomes. And so the -- like I guess me being impressed was how to actually force this into being at the scale that it is and it working as reasonably well as it appears to be. That's the impressive part.
Dave Bittner: Yeah. What do you suppose this does to the future here, I mean, this information being revealed? Certainly I would imagine the powers that be in China aren't happy about this. Do you suspect that there'll be any sort of pivoting here, or is this a system, you know, it's a battleship that's hard to turn on a dime?
Daniel Schwalbe: Yeah, I think it's probably somewhere in the middle. Absolutely, it's a big operation that, just to, you know, completely, you know, start from scratch and throw away all of the old paradigms, that's not going to work. Or if so, it would take a really long time and a big investment. I would certainly feel very concerned for whomever leaked that data. I know there's a hacktivist group that, you know, took credit for it, and they certainly published it. But just looking at the specific, you know, data contained in the dump, this almost had to have been somebody with pretty good access on the inside. This was, in my professional opinion, this wasn't like a smash-and-grab hack, the hack where they found, you know, an open file share somewhere and downloaded this information. Whoops, it wasn't probably locked down. I don't believe so. This appears to be, you know, some kind of inside job or possibly a disgruntled, you know, employee somewhere in the machine that had access to enough of this information. It could be that it was aggregated on some system that got compromised and it wasn't really meant to be leaked. But again, given the specificity and the combination of the files in the data leak, it sure smells like it was somebody, you know, with extreme internal knowledge and access to be able to pull all these files together. I would be very concerned for that person, and I hope they're going to be okay. I think there will be some evaluation of current techniques. We also are not 100% certain how current the information is. Some of it appears to be very current because it talks about stuff that, you know, in the timeline can be placed. But it's also possible that there is additional technologies already being deployed that were not captured by the information in the leak. So it's going to be interesting to see what potential countermeasures the operators of the Great Firewall might be taking as a result of this. To my knowledge, we haven't seen anything very obvious, but this is also something you'd probably want to do low and slow as to not give away that you're already taking countermeasures. [ Music ]
Dave Bittner: Our thanks to Daniel Schwalbe from DomainTools for joining us. The research is titled "Inside the Great Firewall." We'll have a link in the Show Notes. And that's Research Saturday brought to you by N2K CyberWire. We'd love to know what you think of this podcast. Your feedback ensures we deliver the insights that keep you a step ahead in the rapidly changing world of cybersecurity. If you like our show, please share a rating and review in your favorite podcast app. Please also fill out the survey in the show notes or send an email to cyberwire@n2k.com. This episode was produced by Liz Stokes. We're mixed by Elliott Peltzman and Tre Hester. Our executive producer is Jennifer Eiben, Peter Kilpe is our publisher, and I'm Dave Bittner. Thanks for listening. We'll see you back here next time. [ Music ]
