Package manager repository malware detection
Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday, presented by Juniper Networks. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Dave Bittner: [00:00:26] And now a word about our sponsor, Juniper Networks. Organizations are constantly evolving and increasingly turning to multicloud to transform IT. Juniper's connected security gives organizations the ability to safeguard users, applications, and infrastructure by extending security to all points of connection across the network. Helping defend you against advanced threats, Juniper's connected security is also open, so you can build on the security solutions and infrastructure you already have. Secure your entire business, from your endpoints to your edge, and every cloud in between, with Juniper's connected security. Connect with Juniper on Twitter or Facebook. And we thank Juniper for making it possible to bring you Research Saturday.
Dave Bittner: [00:01:13] And thanks also to our sponsor, Enveil, whose revolutionary ZeroReveal solution closes the last gap in data security: protecting data in use. It's the industry's first and only scalable commercial solution, enabling data to remain encrypted throughout the entire processing lifecycle. Imagine being able to analyze, search, and perform calculations on sensitive data, all without ever decrypting anything - all without the risks of theft or inadvertent exposure. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.
Robert Perica: [00:01:53] So, the idea behind supply chain attacks is that an attacker abuses a typical deployment vector such as an update mechanism, a third-party software download, or perhaps infecting a package repository in the hope that an unsuspecting developer might install a particular component and thereby infect his or her own machine.
Dave Bittner: [00:02:13] That's Robert Perica. He's a threat analyst and reverse engineer at ReversingLabs. The research we're discussing today is titled, "SupPy Chain Malware - Detecting malware in package manager repositories."
Robert Perica: [00:02:26] So, in this way, these components are really widespread as they affect a multitude of users.
Dave Bittner: [00:02:33] So what are some of the popular places that are repositories for these sorts of things?
Robert Perica: [00:02:39] Package repositories typically imply PyPI, RubyGems, npm, and so on and so on. But supply chain attacks are not related only to package repositories. They can infect, for example, in the CCleaner case, a popular third-party software distribution repository.
Dave Bittner: [00:02:57] And so, the notion here is that rather than creating software from scratch, folks can go use these components, these building blocks, and plug them into their own projects?
Robert Perica: [00:03:07] Right.
Dave Bittner: [00:03:08] So, let's go through the work that you did here. Walk us through the analysis that you performed.
Robert Perica: [00:03:14] With supply chains attacks becoming more and more popular, we were interested how hard would it be to find an example of such an attack that was still in the wild. Since there are several types of supply chain attacks, we opted to survey package repositories first. So first in line for review was PyPI, and we modeled a couple of YARA rules based on publicly available reports and previous incidents, and then ran the entire PyPI repository through our Titanium Platform processing engine to evaluate the rules. In the end, a couple of packages stood out, and after manual review we confirmed that they were related to previous PyPI incidents but had not been removed in the cleanup action.
Dave Bittner: [00:03:54] Now, one of the things you pointed out here is that typosquatting is a common tactic.
Robert Perica: [00:03:58] Yes, and package repositories can get infected in a couple of ways. One of them is for malicious actors to add additional code to known, widely used packages. But this is hard to achieve, because popular packages usually go through an extensive review process, for example, on GitHub, through pull requests and so on. However, in package repositories such as PyPI, people can upload or submit their own packages with typosquatted names without any review process. So for example, you type "djanga" instead of "django," and you get a mistype, like a typosquat, and you rely on the unsuspecting user who will mistype the name and install the malicious package.
Dave Bittner: [00:04:41] And what did you see in terms of the frequency of people falling for this?
Robert Perica: [00:04:45] It's a pretty common tactic. It's an extremely common tactic when it comes to URL typosquatting, like redirecting to malicious URLs and so on. But people mistype all the time. Like, they try to install all the packages from the requirements.txt file through pip, but they forget to include the .txt file, the .txt extension, and then you actually say, OK, pip install the requirements package. And if the requirements package has a malicious component within it, it will get installed. So yes, this is - I'd say that this would be a common vector.
Dave Bittner: [00:05:19] And so, for the person who accidentally downloads the misspelled version of this, what can they expect to happen?
Robert Perica: [00:05:26] In this case, malware gets downloaded and installed, but since it's not invoked from the setup.py script during the pip installation, it won't get executed out of the box. Though it can be ran as an executable file, or by importing the malicious module and invoking the malicious function, the malicious package will not run by itself. The function itself contains an IP address which has been offline for quite some time, and from that IP address, the malicious function downloads the second stage and persists it as a hidden file, modifies .bashrc file to be executed on every terminal or shell open, and that's basically it. We don't have any information about what the second stage actually is or how widespread it really is.
Dave Bittner: [00:06:10] Can you walk us through - what was the process like when you ran this through your own engine to do the analysis? What was going on there?
Robert Perica: [00:06:18] We modeled our detection rules based on previous incidents, and we focused on the entire PyPI package repository. The dataset contained around 1.6 million files. That amounts to around 2.6 terabytes of different files. And we essentially just plugged in those YARA rules into our engine and ran the entire sample set. The entire run lasted a bit shorter than a day, and at the end of the day we had a bunch of matches on different rules that we plugged in. And then we essentially manually reviewed them and found the offending sample.
Dave Bittner: [00:06:55] So, in terms of a percentage of what you found here, is this a relatively infrequent occurrence?
Robert Perica: [00:07:03] Yes, this is an infrequent occurrence. This is not something that is commonly done, due to the hardness factor - how hard it is to achieve something like this - although, this script is extremely simple and I expect to see much more of such attacks in the future.
Dave Bittner: [00:07:21] When you say that the script is simple, which script are you referring to?
Robert Perica: [00:07:24] I'm referring to the actual malicious components being dropped, so the setup.py script with malicious communication and persistence.
Dave Bittner: [00:07:33] I see. So, in terms of this being discovered, as you mentioned, there's no real mechanism for things like this to be scanned when these projects are uploaded, so it's really up to folks like you and other people to report them.
Robert Perica: [00:07:47] I suspect that on the package repository side, perhaps some kind of a review process might be implemented, although due to the size of the entire repository and many of the people working there are volunteers, I doubt that that will happen at great scale. One of the ways they can process such an amount of files is to buy such a platform like the one we have and continuously process all the files. And of course, on the developer side, you actually have to check what you are downloading, what you're installing and so on.
Dave Bittner: [00:08:21] So, in terms of best ways for folks to protect themselves against this sort of thing, what do you recommend?
Robert Perica: [00:08:26] When you're installing new packages, you could be on the lookout for suspicious network connections and transfers not initiated by pip itself. You can also be careful about what you type and how you type it. It would be great if there was a way for public repositories to enforce some kind of content checks like the continuous processing efforts. However, that's probably not applicable.
Robert Perica: [00:08:50] And on the developer side - for example, in large software companies and so on - some kind of an approval of used modules would be nice. We haven't covered, for example, what other types of files we found in the entire sample set. So one would expect, for example, that a Python package repository contains mostly Python files and perhaps text files. However, we found a couple of executable files for Windows, Linux, macOS, and so on, and we didn't expect to find such things there, for example.
Robert Perica: [00:09:22] One example is a package that can be used to compare files and see the differences between them. And as a testing sample set, it includes a variety of executable and non-executable file formats. So our engine, when it scanned all those files, it identified them and we found, like I said, a bunch of executable files, even additional archives, document files, and so on.
Dave Bittner: [00:09:44] So, what's going on there? Is it hiding a different type of file from what people are expecting to see to try to throw people off the trail?
Robert Perica: [00:09:54] This isn't related to malicious packages we found...
Dave Bittner: [00:09:56] I see.
Robert Perica: [00:09:56] ...This is related to the entire package repository. One of the packages which was scanned but was not malicious was this - like, a file compare package, which included, as its own test dataset, a large amount of executable files for Windows, for Linux, for OS X, and so on, to be used to just check, like, to do a sanity check if the comparisons work as they should. We didn't expect to find executable content apart from Python scripts and PyPI.
Dave Bittner: [00:10:30] Our thanks to Robert Perica from ReversingLabs for joining us. The research is titled, "SupPy Chain Malware - Detecting malware in package manager repositories." We'll have a link in the show notes.
Dave Bittner: [00:10:42] Thanks to Juniper Networks for sponsoring our show. You can learn more at juniper.net/security, or connect with them on Twitter or Facebook.
Dave Bittner: [00:10:53] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security at enveil.com.
Dave Bittner: [00:10:59] The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technology. The coordinating producer is Jennifer Eiben. Our amazing CyberWire team is Stefan Vaziri, Tamika Smith, Kelsea Bond, Tim Nodar, Joe Carrigan, Carole Theriault, Nick Veliky, Bennett Moe, Chris Russell, John Petrik, Peter Kilpe, and I'm Dave Bittner. Thanks for listening.