
The hidden cost of data hoarding.
Dave Bittner: Hello everyone, and welcome to the CyberWire's "Research Saturday." I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems, and protecting ourselves in a rapidly-evolving cyberspace. Thanks for joining us. [ Music ]
Aurora Johnson: We looked at China's data breach and leak ecosystem, and we discovered that there's a lot of interesting differences between the Chinese-speaking cybercrime world and the Russian and English-speaking cybercrime worlds. We found that their cybercrime ecosystem depends a lot more on persistent access, often persistent insider access, directly to data sources, and that they often siphon off this data and sell it on the black market. [ Music ]
Dave Bittner: In today's sponsored industry voices "Research Saturday", we speak with Kyla Cardona and Aurora Johnson from SpyCloud. The research is titled, "China's Surveillance State is Selling Citizen Data as a Side Hustle." [ Music ]
Kyla Cardona: So, as a security researcher, I'm curious by nature, so.
Dave Bittner: That's Kyla Cardona.
Kyla Cardona: You know, when I went on the platforms, like BreachForums and other illicit platforms on Telegram that we know of on the Western, Russian, European side. I would see small bits of Chinese data, and I was curious, I was like, there needs, you know, there has to be more where this came from. And so I took some clues, and I did some deep diving, and I uncovered a very different cybercrime ecosystem in the way that they have, they prefer firsthand data or fresh data, rather than data that is hacked or leaked. So they prefer that because they say that it's directly from the source. So they have two different major exfiltration methods, known as SDK, which is backend permissions on apps mostly. And DPI, which is deep packet inspection, which is done through major telecom -- major telecom centers in China, like China Unicom, China Mobile, and China Telecom. So there's insiders on both ends of that spectrum that exfiltrate data daily, allegedly is what they say, and that's the data that is sold, traded, and also used to funnel these shegong ku's, or SGKs, that are these look-up queries that are public and private. So when it comes to the -when we compare it to the Western, European, Russian side, which mostly consists of packed or leaked data break - data breaches from people, or malicious cyber actors, it's different when you compare it to the Chinese one, because when it comes to Chinese, those actors, they prefer data directly from the source, and they call the database that we call data breaches, secondhand data, because they don't essentially believe that it could -- like, they question the credibility on that kind of data, because it is, you know, from a hacking method or a penetration tool. And they also have this obfuscation tactic, so if they were to breach a website, then they would name it by the industry rather than the actual website. And this is - we believe that this is an obfuscation tactic, because in order to, like, preserve their access to that website, they don't want to name it. They'd rather name it by industry, rather than the actual website itself, which is the opposite of what you see people doing on breach forums for leaked - for hacked and leaked data breaches, because those are usually named by the website itself.
Aurora Johnson: I was just going to add on to that last point, which is just for, on forums like BreachForums and some of the western forums that we track, oftentimes the actors actually will name breaches not even for the entity that they hacked, but for the entity that has the most interesting data within the dataset. And there's just, like, a strong culture of boasting about the breaches that they have, and it - and it's the opposite in China. They're trying to be stealthy, and like Kyla said, it's likely preserve insider access in at least some cases.
Dave Bittner: You know, that's interesting. Aurora, with the research that you all have done here, how big a window do you suppose you have into this network?
Kyla Cardona: So we track hundreds of Chinese-language cybercrime channels on Telegram, some of which have tens of thousands of accounts that are in them, but I think this is just a window into the overall Chinese cybercrime threat ecosystem, and we've only scratched the surface in looking at these actors.
Dave Bittner: I see. Well, let's talk a little bit about the terminology and language. The research mentions some Chinese slang, things like pantsless data. How do these colloquialisms shape the way that these data leaks are advertised and traded in these specific communities? Kyla?
Kyla Cardona: Yeah, so pantsless data is a homophone for library dragon, which is - technically means in Chinese slang that you're hacking someone's databases. So these terms are really important to understand the cybercrime ecosystem, because you can't find this data without using those terms, panstless data, SDK, DPI, MD5. MD5 is another term that they use for anything that is cracked, so any cracked - or anything you're trying to crack. So any hashed passwords, whether it's SHA-256, MD5, SHA-512. All of those things, they all fall under MD5, so the slang is very important in understanding the ecosystem, I'd say, and it's used in all aspects of the platforms that they use. So whether we see them on X, on Telegram, once recently on Bluesky, that's the only way to kind of understand the ecosystem as if you know what these terms mean, because these are what they used every day in this entire cybercrime ecosystem.
Dave Bittner: Yeah, it's an interesting insight. You kind of - you have to know the lingo to be able to read what's happening behind the scenes, I guess.
Kyla Cardona: Absolutely.
Dave Bittner: Yeah. When you look at the trends here, I mean the methods that these Chinese actors are using to exfiltrate and sell the data, how do their methods compare to some of the other global cybercriminal operations that you all are familiar with?
Kyla Cardona: It's very different because I'm not aware of Western, or European, or Russian actors truly having exfiltrated data every day from insider access or backend permissions, and then funneling into a shegong ku, which I don't believe anybody is safe from, because these data leak channels, and those shegong ku's, they have data from prominent Chinese CCP members, ABT members, PLA members, all the way to ethnic minorities, so everyone can be on those shegong ku's, although there are some cybercriminal actors that already post and tell and, sorry - there's already - there's some Chinese cybercrime actors that post on their channels that they will not post any information about people in special departments, those in government positions or state-owned enterprises. So while there's those people that do have protections for people in the government, there are some shegong ku's, or SGKs, that do not have the same type of exceptions, and they will post about Chinese officials, people in the government, PLA members, APTs, pretty much anyone and everyone.
Dave Bittner: Aurora, anything to add there?
Aurora Johnson: Yeah, so I think Kyla really hit the nail on the head, but I think the main difference is, there's much more emphasis on maintaining persistent access and posting data every single day from that same access. And I would say in the West, a lot of times it's more about hacking as much data as you possibly can and leaking it all. So for example, we see ransomware data leaks, where they're very loud about having access to networks. They lock and encrypt the entire network, and then they'll post literally sometimes, you know, 40 terabytes of data from a single intrusion. In China, they're a lot more focused on having persistent access over time, and posting every single day, new data from that access. That also gives us challenges as SpyCloud, because when we're looking at that data, we have to use different strategies to try and parse it, and understand it, and understand what's there.
Dave Bittner: That's a really interesting insight. I mean, is it fair to say that these Chinese actors assume that they're going to have ongoing access, whereas, you know, a Western actor, it seems like more of a smash and grab kind of thing, you know? We don't know how long we're going to be in here, so we have to get everything we can as quickly as possible. Is it just a different assumption there?
Aurora Johnson: Yeah, I, I think it's just a different mindset for how they approach data access. So, like Kyla said, a lot of times either they have persistent insider access and they're trying to preserve that access and utilize it for as long as they can, or sometimes they even have persistent access to a data source because, in some cases, like in the SDK method, they've developed a, a software development kit for an application that has - or for, you know, mobile applications that have persistent access into all kinds of individuals' phones, and they have elevated permission so that they're able to just continuously extract data from those phones, and then those, that data is then in turn being sold on the black market. And we see a lot of it show up on just, like, Telegram channels.
Dave Bittner: I see. The research mentions that Telegram and X, formerly Twitter, are platforms that are actively used for these activities. Why are these platforms so effective? Why do we think that these threat actors choose those places? Kyla?
Kyla Cardona: I believe that threat actors choose those places because Telegram isn't under heavy surveillance, as WeChat and QQ are. I have seen articles in our research where people are talking to these data brokers and hackers on WeChat and QQ, and they want to take the conversation to Telegram, and on X they use that more as an advertisement platform, where they can advertise their SDK and DPI data to lead back to their Telegram channels. And in these Telegram channels, they're either private or public, and this is where they, you know, just upload a bunch of data that is exfiltrated freshly or, you know, daily, and this is where they also sell data, and I believe Aurora can talk more on the payment methods that they take for shegong ku's and even buying data outright from these SDK and DPI methods.
Aurora Johnson: Yeah, I don't think they're not on the Chinese apps, but I think that we do see some of them take measures to avoid the surveillance inherent in some of the Chinese technology apps. So for example, Telegram, they use Telegram a lot, and that's blocked by the Chinese government, and has been blocked since 2015, so they have to use a VPN to access it outside of China. And similarly, they'll use cryptocurrency to try and do their transactions. So we've seen some actors accept payments on things like Alipay, or like, other Chinese payment apps, but we see most of them accept payment in USDT, which is the abbreviation for the Tether cryptocurrency. It's a cryptocurrency that's tethered to the value of the U.S. dollar, so it has a very stable value. And we see a lot of them using that as the main method to accept payment. [ Music ]
Dave Bittner: We'll be right back. [ Music ] You know, one thing that the research mentions is insider cooperation within telecommunications companies or even the device manufacturers. Can you share some insights on that?
Kyla Cardona: Yeah, I believe that -- and Aurora can speak more about this in detail, they have both the incentive to and motivation to provide that insider access because of the annual or average salary in China, and because they make these advertisements so enticing, and they, you know, they have protections. So first they mention about how much you can make a day, which is about 10,000 yuan. If, you know, that depends on customer's orders for this recruitment. And it can go even higher than that, and they also talk about protections for withdrawing the funds that you've made through, like, cryptocurrency mixing, and I think Aurora can speak more to that.
Aurora Johnson: Yeah. Yeah, just like Kyla said, we do see a lot of recruitment for insiders, particularly in government positions in the public security bureaus, or at large financial institutions like the public banks, and then also particularly in the big three telecommunications companies, China Unicom, China Mobile, and China Telecom. And they're often using that insider access to siphon data off, and they can make a lot of money. We've seen ads say that individuals that do a lot of queries or have a lot of contracts can make up to 10,000 yuan per day, and in some cases up to 70,000 yuan. And then to put that in perspective, I have a number that we pulled. One moment.
Dave Bittner: Uh huh.
Aurora Johnson: And then to put that in perspective, the median annual after-tax income per capita in China was 33,000 yuan last year. So that's around $4500 in U.S. dollars, so if you can make this third of the average per capita income in a single day doing insider queries at your workplace, with - as some of these data brokers say, minimal risk of being fired or caught, that is an enticing offer.
Dave Bittner: Yeah, that could be quite a payday. How do you rate the technical sophistication of these actors, when you compare them to, you know, to other groups that you've researched? Where do they stand?
Kyla Cardona: Sorry, that's a very hard question, but I believe that they have some - they have this word, they call themselves crawlers, so those are the tech people that are on these platforms. So when you look at a Chinese data leak channel, there's salesmen and there's crawlers, which are the tech people, and for the tech people, they mostly seem to use Python web crawlers to try to get data from foreign websites, and they seem to know some technical knowledge or expertise, but it's hard to say, because the data that they mostly exfiltrate, or most of their targets in these cybercrime ecosystems, are domestic data is what it seems like from the SDK and DPI methods. But they do show some overseas data from, like, Chinese overseas Americans, or Chinese overseas from the U.K., Chinese overseas Thailand. So they do have overseas data, but that data seems to be less consistently posted as when compared to the SDK and DPI data that they post constantly. And while they seem to have some sort of technical knowledge of, you know, hacking tools, it doesn't seem very invasive in the way that, like Aurora said earlier, where they grab all the data and, you know, the smash and grab type of thing. They seem to want to maintain persistent access and be stealthy, so it's hard to say what, you know, the technical expertise on that extent, but they do have some technical expertise when it comes to hacking some websites, but it doesn't seem like it's their focus.
Dave Bittner: I see. So can you share some of the challenges that you all face when it comes to tracking these specific threat actors? You know, particularly given some of the linguistic and cultural nuances here?
Aurora Johnson: I think one of the main things is, as you said, just trying to understand Chinese, [laughter], but then also specifically the slang terms that they use, and being able to use that to understand what they're talking about. So I think that's, like - I think that's one of the main ways that we've been able to track these actors and find a lot of them talking about these different data breaches, is just understanding different Chinese slang terms.
Dave Bittner: One of the quotes that you all shared in the research was from a China-based blog, and it said, the data you leave on the internet knows you better than your mother. I think that's an evocative statement, [laughter], don't you think?
Kyla Cardona: Yeah. It came from a shegong ku article, and you know, the data that's posted out there is very invasive, and when it comes to a shegong ku lookup or an SGK lookup, you can find so many things about people that you wouldn't normally find on something like, you know, Whitepages or Fast People Search. Like, this information contains everything from hacked databases from the Western side, because they are also reposting those on the Chinese data leak channels, but also a lot of information, like, domestic to China. So there could be bank account numbers, passwords, emails, account numbers for securities investments. There, so there's a lot more information on these free SGK lookups than you could imagine. Even hotel room booking records.
Aurora Johnson: And to kind of explain the SGKs a bit further, essentially they are databases of hacked, leaked, and breached data that they maintain, and anyone can, either for free, oftentimes, or by paying a very small amount, equivalent to maybe 1 U.S. dollar, do a lookup on anyone based on their phone number, their national ID number in China, their name, their email address, or a bunch of different indicators that they offer searches with, and then you can get back all kinds of data, which includes things like account information for all kinds of different apps, bank account information, financial records, and hacked data, which sometimes includes passwords. Then, oftentimes they also - these same SGKs will have advertisements for different private lookups that they're able to do. These cost significantly more, usually, than the, you know, lower-level lookups, but those are often done directly by insiders querying the networks at their workplaces. So those private lookups might include things that they're able to do like facial recognition searches, GPS tracking of an individual, like, phone and call records and texting records, and also government records. So things like your social security data or business registrations that you've done, any government records. Oh, arrest records as well, and things that police departments might have. So, with those types of insider data, you can kind of see how they are able to really get a comprehensive view of an individual for a relatively low cost, just by buying it in China, and I don't think that the exact same thing is necessarily possible in a lot of other places.
Dave Bittner: No, it's really fascinating research, and you know, for me as I was reading through it, I felt like I had to recalibrate my notion of privacy. You know, it's easy for us to complain here in the U.S. that there are many ways that our privacy is being, you know, violated by various tech companies, but this is a whole different level than I think how - certainly how it was framed in my own mind. When you all look at the research that you've done here, what do you hope people take away from it? What are the take-homes here?
Kyla Cardona: I just want them to realize that the Chinese cybercrime ecosystem is vastly different from the Western, Russian, European side, because there's different components in the Chinese cybercrime ecosystem, and a lot of it focuses on fresh and high-quality data that's directly from the source. So you have, you know, some Chinese cybercriminals exfiltrating this data, and being an insider to this data, and selling this data, but also on the other side, aggregating the data even more by putting it into their own collection of leaked or hacked databases called SGKs and selling, you know, access to those for even more money, aside from data orders from SDK and DPI methods. So it's very different in that way, as well as even if the Chinese were to hack some sort of website, they're going to try to maintain their stealth to maintain their access, and that's why they name it by industry rather than the actual website name or company name, and that is scary to me because, you know, it's hard to figure out, you know, who was hacked if they just name it by industry, unless you really take a closer look at the data. And while, you know, all of this is really scary, because there seems to be no privacy, it's really much a double-edged sword when it comes to all the data collected by the CCP mandate because, you know, while it is collected for and by the CCP, it can also be used against them, and in some cases it has, when we have found in our research that there's bank accounts numbers on CCP members, even passwords and email addresses on them as well.
Dave Bittner: Aurora, any final thoughts?
Aurora Johnson: Yeah, I think definitely agree with everything that Kyla mentioned. I think a lot of the surveillance state rhetoric around, you know, China collecting data on its citizens really focuses on individuals' most targeted by the state, like ethnic minorities, but we can see that this robust leaked and hacked data industry in China poses privacy risks across all groups of Chinese people, including it listed high-ranking CCP officials, and also APT actors who do contract work for the Chinese government. I think while this is a huge privacy concern for everyone in China, and also people not in China that interact with the technology ecosystem, use Chinese apps, et cetera, it also, in some cases, can be a valuable source of data for Western cybersecurity researchers, because you can find a lot of data on the advanced persistent threat actors that are hacking United States critical infrastructure in these databases, and use that to track them.
Dave Bittner: Oh, that's interesting. Alright, well, before we wrap up, I would - what I really want to do is kind of go back to the beginning. There's one little thing that I think we're missing in our conversation today, and that's a really nice introduction. So now that we're all a little comfortable and we've gotten the butterflies out, let's just take a minute, and we're going to pretend like we're just starting out here, so we get a nice introduction to the segment. And Kyla, I'm going to start out with you, and I'm going to ask you this. So, can you explain to us how this first came on your radar and how you all decided to pursue this line of research?
Kyla Cardona: Yeah. This first came on my radar because I was interested in looking at the Chinese cybercrime ecosystem, because I saw some little bits of leaks of Chinese data on breach forums and other forums, so I was curious to find out where they came from. And when I translated the word data into Shuji, I was able to pivot off of that and find a bunch of Chinese data leak channels just based on that word, and also Chinese characters. And from there I was able to uncover SDK, DPI, and MD5, which are the most prominent keywords that they use in English when it comes to describing the Chinese data leak channels. [ Music ]
Dave Bittner: Our thanks to Kyla Cardona and Aurora Johnson from SpyCloud for joining us. The research is titled, "China's Surveillance State is Selling Citizen Data as a Side Hustle." We'll have a link in the show notes. We'd love to know what you think of this podcast. Your feedback ensures we deliver the insights that keep you a step ahead in the rapidly-changing world of cybersecurity. If you like our show, please share a rating and review in your favorite podcast app. Please also fill out the survey in the show notes or send an email to cyberwire@n2k.com. We're privileged that N2K "Cyberwire" is part of the daily routine of the most influential leaders and operators in the public and private sector, from the Fortune 500 to many of the world's preeminent intelligence and law enforcement agencies. N2K makes it easy for companies to optimize your biggest investment, your people. We make you smarter about your teams while making your teams smarter. Learn how at n2k.com. This episode was produced by Liz Stokes, were mixed by Elliott Peltzman and Tre Hester. Our executive producer is Jennifer Eiben. Our executive editor is Brandon Karpf. Simone Petrella is our president, Peter Kilpe is our publisher, and I'm Dave Bittner. Thanks for listening. We'll see you back here next time. [ Music ]