In this episode of the CyberWire’s Research Saturday we are joined by Jordan Wright, Senior Research and Development Engineer at Duo Security. He’s the author of the research report, “Phish in a Barrel,” which describes his work gathering and examining thousands of phishing kits from around the web.
Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Dave Bittner: [00:00:23] I'd like to tell you a little bit about our sponsor, Cybrary, the people who know how to empower your security team. Cybrary is the learning and assessment tool of choice for IT and security teams at today's top companies. They deliver the kind of hands-on training fifty-five percent of enterprises say is the most important qualification when they're hiring. And once you hire, you want to retain. And Cybrary helps there too, because seventy percent of employees say professional development is a big reason for staying on board. Visit www.cybrary.it/teams and see what they can do for your organization. Not only is it effective - it's affordable too, costing just about a twelfth of what legacy approaches to training would set you back. So contact Cybrary for a demo. That's www.cybrary.it/teams, and tell them the CyberWire sent you.
Jordan Wright: [00:01:20] So I have quite a bit of a background in terms of phishing. It's always been a hobby of mine and an area of interest.
Dave Bittner: [00:01:26] Jordan Wright is a Senior Research and Development Engineer at Duo Security. He's the author of the research report, "Phish in a Barrel."
Jordan Wright: [00:01:34] My first exposure to phishing kits was a couple years ago, whenever I was doing some just high level independent research around phishing in general, and I came across a phishing kit almost by accident. I found the kit, downloaded it, analyzed it, and realized maybe this is something that could be looked at at scale. You know, if we did this for thousands of phishing URLs, not just a one-off kind of approach. But as time goes, you know, it always was kind of put on the backburner in terms of projects. And whenever I was looking at areas to investigate for Duo, this came up and really just struck an area of interest. I knew this was something that I wanted to look at. We had the resources and time to look at it, which resulted in this project.
Dave Bittner: [00:02:23] So, before we dig into the actual research, can you give us a sense for the landscape? I mean, what is the state of phishing these days?
Jordan Wright: [00:02:30] Sure. Phishing is absolutely on the rise. There was a report given out quarterly last year, during 2016. There's an organization called the Anti-Phishing Working Group, and they consist of multiple organizations who all come together to share research, share data, and to try to combat phishing as a whole. And so they release quarterly reports that indicate the state of phishing and how often phishing sites are being seen. And what we found in 2016, the number of unique phishing sites seen in a given quarter, we broke the record. And we broke that record twice in 2016. First in Q1, and then in Q2.
Jordan Wright: [00:03:13] To give you an idea of the kind of scale that we're talking about, in Q2 there were over 460,000 unique phishing sites seen, just in that quarter. To kind of break that down into a different number, that's over 5,000 unique phishing sites seen per day. It's clear that we need to start thinking in terms of at-scale approaches to mitigate phishing. We need to analyze this as a bigger problem than just trying to hit one-off phishing sites and try to keep up and play whack-a-mole every day. So these numbers are not only growing, but they're growing to a level where we're having to start taking different approaches.
Dave Bittner: [00:03:50] Take us through how a typical phishing kit works.
Jordan Wright: [00:03:54] To give a bit of background on why phishing kits are even important at all, we have to remember this is a business. For attackers, like any business person, their goal is to maximize the return on investment. And so the entire idea around phishing kits is how can I make my phishing campaigns as efficient and cheap as possible? Because if I can do that, and I can start harvesting credentials, then my return on investment is higher.
Jordan Wright: [00:04:19] And so what they'll do is they'll start by figuring out what site do I want to spoof. Let's say it's Facebook or Office 365 or Gmail - you name it. They'll figure out the site that they want to clone. They'll download local copies of all that site's resources. This includes the HTML, the images, the stylesheets, everything that they need to host a local copy of that website. And then they'll change the login form to point to a script that they control. Typically, this is a short PHP script that does nothing more than collects those credentials and it - almost ironically - emails them to the attacker saying, I received these credentials from this phishing campaign.
Jordan Wright: [00:05:02] After an attacker has all these resources in the script, they'll bundle these up together into a zip file, and then they'll figure out, where do I want to host my next phishing campaign? But this zip file is the phishing kit. This has everything they need to run the campaign. So they'll look out and they'll find, let's say typically a compromised CMS instance, like a WordPress instance, and they'll exploit an out-of-date plugin or out-of-date theme to get access to upload their phishing kit onto that server. So they'll upload the zip file, they'll extract the files, and then they have a working phishing site on this hacked website. From there, they'll send out emails pointing to their new website, and they're off to the races.
Dave Bittner: [00:05:47] Now, the hacked website that they load their files onto - would the person running that site even be aware that this phishing kit might be living on their site?
Jordan Wright: [00:05:56] It really depends. It depends a lot on the monitoring that they have enabled. Traditionally, what would likely happen is, after phishing campaigns and after phishing emails are starting to be sent out, the abuse reports would start rolling in. Security companies would start detecting these sites and trying to shut them down. They'll send notices to the registrar, who in turn will let the person running and operating that website know so that try to go and clean up those efforts.
Dave Bittner: [00:06:21] And the people who have fallen victim to this - they might not even know that they've given up their credentials.
Jordan Wright: [00:06:27] You're absolutely right. And this is where phishing kits can be really sneaky. So after I've put in my credentials, the last trick up a phishing kit's sleeve is that it's going to redirect me to the legitimate website. Because at this point, it has my credentials. It doesn't need anything else. And so, by redirecting me to the legitimate login form, as a user, I just feel, I guess I put in my credentials wrong. I guess I must have done something different, or the site didn't work, there was an error. But either way, now if I look up in the address bar or the URL bar, I see the legitimate website. I don't think anything happened.
Dave Bittner: [00:07:03] So there's not even any sense that something's gone wrong and I have a good feeling I've gotten where I wanted to go, and meanwhile my credentials have been sent off to the bad folks.
Jordan Wright: [00:07:13] Exactly. You know, this is this catches people just trying to live their daily lives, trying to do daily business, and so they would just chalk this up to say, I guess something went wrong, I'll log in again and move forward.
Dave Bittner: [00:07:25] So let's go through - how did you start tracking down these phishing kits?
Jordan Wright: [00:07:29] So it all starts with knowing what to look for. And this came from my previous research into this individual phishing kit, which is knowing some different tricks and kind of relying on attackers being lazy in leaving these kits behind. Because that's really the whole reason this research was possible, is that whenever these files are extracted, they don't always delete the original zip file, and that's what we're targeting. If we can download that zip file, we can analyze the code inside of it, including the email address that these credentials are being sent to, as well as what information is being collected.
Jordan Wright: [00:08:08] And so we started by trying to figure out what are the best ways that we can track down this zip file? And there are two ways that we came across. The first is looking for what we call directory indexes, or directory listings. In web servers, it's commonly the case that they'll say, if you request a URL that ends with a slash - so indicating a folder - I would know what page to serve you. Say that's index.html or index.php. Because I'm presuming that that file is going to be present in every folder. If it's not, which is commonly the case with these phishing kits, web servers can fall back and say, I'm just going to give you a listing of all of the contents in the directory.
Jordan Wright: [00:08:54] This includes all the file names that I have in this folder. One of these file names would be the zip file. So that makes it really clear and easy for us to say there's a phishing kit, even if it's not the same name as the extracted contents. You know, let's say they call their phishing kit "Office 365 phishing.zip." In the folder it's just - in the URL you would just see "Office 365." That's a really quick way for us to for sure get phishing kits if they're left on the server.
Jordan Wright: [00:09:24] But directory indexing and directory listing is configurable, and it's not always available. In our research, we found that it was available about 23 percent of the time. So a good amount, but it's not 100 percent reliable. So we had to look at another method which is, again, relying on attackers being lazy in naming the zip file the same name as the extracted folder. So if they named their zip file "Office365.zip" and then they unzip those files into "Office 365," the folder, all we have to do is just work our way up the URL replacing every slash with ".zip" and then if that phishing kit is there, we can download it.
Dave Bittner: [00:10:09] And so you gathered up quite a number of URLs. Take us through that part.
Jordan Wright: [00:10:14] We did. So we sourced our URLs from two different places. These are both community-driven feeds where anyone can go and submit a phishing URL to these feeds, which then in turn work with different security companies to try to shut them down. The first is called PhishTank, and they're run by OpenDNS, and the second is called OpenPhish. So we took both of these feeds and we watched them for a month. And over the course of a month we analyzed over 66,000 phishing - possibly phishing - URLs. I say "possibly" because anyone can upload any URLs they want. So it's not a guarantee that all of these are phishing, but a majority of the time they are. So after we analyzed all 66,000 of these URLs, we downloaded over 3,200 unique phishing kits.
Dave Bittner: [00:11:05] What's some of the data that you were able to gather from all of those unique phishing kits?
Jordan Wright: [00:11:09] That was the next step. We had this huge corpus of data and we need to figure out, what does it mean? What's the significance? What can we learn from it? And this is where we started digging in. The first interesting thing that we found was that attackers are pretty good, or at least are trying to evade detection from security companies. The way this works is that it's a cat-and-mouse game. Attackers will stand up a new phishing site, they'll send out emails, and they know that security companies are always looking to locate their phishing site and to shut it down. And so it's to their advantage - remember, this is all return on investment - for them to try to keep their phishing site available and up as long as possible.
Jordan Wright: [00:11:51] So there's a couple of things that they'll do to try to keep that level of persistence. The first is that they'll use a file called an ".htaccess" file. This is something that is specific to the Apache Web Server, and it's a file that lets administrators tell Apache, here are the connections that I want you to allow or deny. And you can use - you can do this based on any number of interesting attributes, like the user agent or the IP address or the domain that they're claiming to come from.
Jordan Wright: [00:12:25] And so attackers will use these .htaccess files to put in the information about security companies. They'll say, I want you to deny connections from these IP addresses which are known to belong to this security company. I want you to deny connections from this user agent which is known to be a crawler from this other company. And by doing this, they can try to hide a little bit. They can try to evade this detection, where if a company is going and looking for these websites, if they're using the infrastructure that this .htaccess file is designed to block, they wouldn't see the site. It would be kind of hidden from view.
Jordan Wright: [00:13:04] This was really common. This was really prevailing in all the kits that we found, and we found over a hundred and eighty-five different, unique .htaccess files. So this shows that there's definitely a level of information sharing between attackers. They'll kind of piecemeal different, you know, one piece of IP addresses from this file that I found, some user agents from this one, and they'll kind of mix and match, but they're all doing the same thing. And this is the same technique and the same idea that they'll use in a different way.
Jordan Wright: [00:13:38] So another detection - or another evading technique that they'll use is by creating PHP scripts which do the basic same thing that the .htaccess files do. And they're designed to block connections based on any number of HTTP request attributes. Again, the user agent, IP address, you name it. But this is where things kind of got interesting. As we were looking through these PHP scripts to try to see what it is that they're trying to block or allow, we came across something really interesting, which is that we found multiple PHP scripts that had a hidden backdoor. This backdoor allows anyone - if you know what parameter to put on the end of the URL - to execute whatever system commands you want.
Jordan Wright: [00:14:23] So this kind of falls back on to phishing being an economy. In addition to attackers standing up their own campaigns, there's an entire economy around sharing, selling, trading phishing kits between one another. So one attacker may create a phishing kit and then trade or hand it off to any number of other attackers for use in their own campaigns.
Jordan Wright: [00:14:46] But it seems like some enterprising attackers - maybe people who wanted to get a little bit of access without really putting in the work - decided to put these hidden backdoors into these files as a way to kind of maintain that persistence, as a way to maintain that control, and still have access to servers, to hosts, that they didn't take any part in compromising in the first place.
Jordan Wright: [00:15:09] These backdoors, we - we kind of expected to see a couple of them, from previous work that had looked at similar situations in the past, but what really surprised us was the scale of the backdoors that we came across. The particular backdoor that you'll find in a report - that unique string was seen over two hundred times, indicating this is surprisingly common. You know, these kits are being traded and used very frequently, but many of them are backdoored, letting anyone, including other attackers or security researchers or really anyone who would like to access these hosts, can do so through these backdoors very, very easily.
Dave Bittner: [00:15:55] Do you think with that many backdoors being out there, that it's a matter of - I don't know, almost a cost of doing business for the folks who are putting these out there that they're, you know, the stuff still works for them, but in exchange, these backdoors allow other people to take advantage of the work they've done?
Jordan Wright: [00:16:11] That's a really good insight. That's absolutely possible. You know, we have to remember that this is all about quantity not quality. You know, they're - attackers know that their phishing sites will be shut down relatively quickly. There's a lot of people looking for these and they're doing a very good job of finding and shutting down these phishing sites. And so it may be the case that attackers realize the tradeoff of analyzing every file in their kit for any kind of backdoor. Like you said, it may just not be worth it. It may be a cost of doing business. It may just say, I'm here to get my credentials as quickly as possible and then I'm out. I'm going to go somewhere else.
Dave Bittner: [00:16:51] You also discovered a lot of reuse with these kits.
Jordan Wright: [00:16:54] Yes. After we analyzed the contents of the kits themselves, we wanted to figure out, what does the landscape look like for these phishing kits? Where are they being used? Can we identify two sets of problems? The first is - can we identify unique phishing kits that are used in more than one place? Because this would indicate the same attacker running multiple campaigns and compromising multiple hosts. And we did. We found that, in our month span, most of the phishing kits that we came across were seen once. But 27 percent of the phishing kits that we found - about nine hundred of them - were seen in more than one place.
Jordan Wright: [00:17:32] In fact, a couple of the phishing kits we've seen were found on more than thirty unique hosts, indicating that attackers had compromised thirty different web servers and ran thirty different campaigns, all in the course of a month. Which is pretty active. You know, these are very active attackers, you know, constantly running new campaigns. And so being able to track this reuse gives really valuable insight to security researchers, because they can start tracking actors in different places using very simple techniques that we show in the paper.
Jordan Wright: [00:18:07] The second problem, the second area that we wanted to map - and this is another area where it gets really interesting - is can we track attackers across different phishing kits? So, the way that we decided this phishing kit is unique, is that we took a hash of it, which means we shortened it down to a set of characters which guarantee that it's a unique identifier across our data set. So this means all of the content in that kit, including the email address of the attacker where credentials are being sent, is bundled into that hash. So if that email address or any other content changes, that hash will be different.
Jordan Wright: [00:18:51] And so we took all these hashes, and then we also took it another step further and we extracted every email address we saw in the kits, and then we mapped all those out. Which email addresses are found in which hashes, which unique phishing kits. And that's the map that people will find on - I think it's page twelve - where we talk about tracking actors across kits.
Jordan Wright: [00:19:15] And here's kind of - it gets even more interesting. We talk about having an email address for where the credentials are being sent to, but there's another kind of interesting part about it which is, whenever attackers create these phishing kits, they want to leave kind of a signing card. They want to leave a note that says this person created this phishing kit, almost to get credit for it. A typical place that they'll put their email address is as the "from" address. So whenever you send an email, it has to have a "from" address. Well the email containing the stolen credentials is generally going to be sent from an email address that's the signing card for the person who made the kit.
Jordan Wright: [00:19:58] So what this means is that if I create a phishing kit, and I put my email address as that signing card, I give it to another attacker. They go and they run multiple campaigns. Both my email address and that attacker's email address will be associated through that phishing kit. So we can take all these email addresses - both sender and recipient - and all these kits, and we can map them out. And then we result in an incredible landscape, where we can see, here's probably who created all these kits, here's all the kits that they're associated with, and then here are the people using those kits, and then here the URLs those are being used at. So you can, at a glance, see the entire ecosystem and the landscape of what phishing attacks are being launched and who's behind them.
Dave Bittner: [00:20:51] And so, being able to have that view of this ecosystem, what kind of information were you able to gather from that?
Jordan Wright: [00:20:58] One interesting finding that we came across was a single email address was found in more than 115 unique phishing kits. Now, this email address was used, like we talked about, as that signing card, as that "from" address. Which indicates that this actor who created this kit distributed it to any number of people, or they got their hands on it somehow and started using it. But seeing this wide of a scale in such a short time frame shows that the kits created by this alias are very common. And the kits that we found weren't just - it wasn't just one kit for one service. We found kits with this actor's email address for, you know, almost every service provider - Gmail, Office 365, you name it. So it's not just one single attack vector. The people creating these kits are making them for any number of different services before they distribute them.
Dave Bittner: [00:21:55] And what else can you learn about the overall ecosystem? Is this a situation where you have a, you know, a handful of kingpins who are then distributing the software to workers below them who are doing the dirty work, or is it more distributed than that? Is there any sense for that sort of thing?
Jordan Wright: [00:22:12] I'd say there's a healthy mix of both distributors - people who make either full kits themselves or just components of them. Maybe they just make the credential-stealing script and then they distribute that, and say you're going to have to clone your own pages but you can use this script to send out the emails.
Jordan Wright: [00:22:30] And then there's also the side of people who are more DIY in terms of creating their own phishing kits. The barrier to entry in this type of attack is very, very low. That's why it's so common. Because it's easy and cheap to get into, and it still yields incredible results in terms of the effectiveness of phishing in general. So this landscape is still pretty distributed, but it does have that healthy mix where we see both sides of the story.
Dave Bittner: [00:22:59] For those who are trying to defend against these sorts of things these phishing attacks, what advice do you have for them?
Jordan Wright: [00:23:05] Absolutely. So this is an area that we're really excited about, because we took all the code that we used to run this experiment and were open-sourcing it. We're making it freely available on GitHub for anyone to download it and try to replicate our results for their own organization. They can put in phishing URLs that they're seeing against their own user base to try to track down the phishing kits behind them. And this gives admins a really good look at what information is being captured, as well as who's behind the attack, where these credentials are being sent.
Jordan Wright: [00:23:37] There's also the opportunity to partner up with different mail providers, to where we can say we've come across a phishing kit that's sending credentials to this email address, you may wish to shut this down as an attacker's account. So by having this information we can start to have a much more full and rich incident response process that lets us take active measures on these phishing attacks as they occur.
Dave Bittner: [00:24:03] Is there the possibility of automating the response to these sorts of things?
Jordan Wright: [00:24:08] That would be kind of taking this research to the next step, which is, now that we have the ability to download this data in bulk and almost in a streaming fashion, could we somehow develop automated measures to respond to the emails that we find, to the phishing URLs that we find?
Jordan Wright: [00:24:26] There's a pretty good amount of automation being built in to respond to phishing URLs that are found, so these would be threat feeds that hook into popular products. A really good example is Google Safe Browsing, which is built into the Google Chrome web browser, where as soon as they know about a confirmed phishing site, they'll add that to a global block-list where, whenever you try to navigate to that website, Chrome will tell you this is a known phishing site, you may be in a phishing attempt you may wish to go somewhere else at this point. Which is a really effective way to get widespread protection for consumers.
Jordan Wright: [00:25:04] And so, the type of automation around phishing URLs is pretty strong. But there's a level of automation that we could introduce around, what do we do now that we know the attackers behind these campaigns? Can we, kind of like we mentioned earlier, can we work directly with mail providers to send them the stream of email addresses if they don't already have them, indicating that they were known to be found in fraudulent phishing campaigns, where they could shut down those accounts even easier? And once the account is shut down, any phishing credentials sent to that email address wouldn't be collected and couldn't be used for further fraud.
Dave Bittner: [00:25:40] So, at Duo Security you have some tools that help people test their ability to stand up to these phishing attacks, and through that you all get some interesting statistics. What can you share with us about that?
Jordan Wright: [00:25:52] Sure. So we do have a free tool called Duo Insight, and what it does, it allows organizations to test their own exposure to phishing, completely free. So they can set up a campaign with popular phishing pretext and see how likely it is that their users would open, click, or even submit credentials to fake phishing sites. And so we're always collecting anonymous statistics about how effective our phishing campaigns are.
Jordan Wright: [00:26:20] And recent statistics show that, over the course of testing about 150,000 recipients, we find that 45 percent of recipients open the email and 24 percent of recipients click the link. And at this point, it's important to take a step back and say this could already be game over, in some aspects. Because we hear about a browser plugin vulnerabilities like Flash or Java. If those are out-of-date, it's easy for attackers to stand up malicious websites which then compromise those plugins and install malware on the system. And so even clicking the link can be pretty disastrous if we're not keeping our software up to date.
Jordan Wright: [00:27:02] And taking that a step further, we found that 13 percent of recipients actually go to the next step and enter their credentials into the fake phishing site. To kind of take that from a different angle, we found that 63 percent of campaigns were successful in capturing at least one credential.
Jordan Wright: [00:27:18] So it shows that, you know, we talk about phishing being cheap, and it's getting even more effective with the use of phishing kits, but it's also very effective as a practice. It's very effective as a measure to gain access to sensitive data, or gain access to accounts or systems, if more than half of your phishing campaigns are going to receive a credential. That's a really good return on investment, and it shows why it's so important to really study and try to protect against the phishing landscape.
Dave Bittner: [00:27:46] What is your sense as to where we are in terms of facing this threat? Are we gaining? Are the phishing people doing better with us, or are we doing a better job of shutting them down?
Jordan Wright: [00:27:57] I've seen, especially in recent years, there's been multiple companies that have done incredible work at taking on phishing from a wider scale. I mentioned Google Safe Browsing, and that's a perfect example of Google realizing that they can help protect a large user base of anyone who uses Chrome against phishing sites very, very quickly.
Jordan Wright: [00:28:20] So we're making really good strides in terms of trying to protect against the increased number of phishing sites that we see, but it's still safe to say that we have room to grow. We have room to continue doing better, to continue studying these attacks and trying to figure out what protections can we put in place to try to thwart them. But as a whole, you know, security companies and browser makers are doing a good job of trying to combat a threat and take it head on, which is always encouraging to see
Dave Bittner: [00:28:52] Our thanks to Jordan Wright from Duo Security for joining us. You can find the complete report, "Phish in a Barrel," in the blog section of the Duo Security website.
Dave Bittner: [00:29:01] And thanks again to our sponsor, Cybrary, for making this edition of Research Saturday possible. Visit www.cybrary.it/teams, and see what they can do for your organization.
Dave Bittner: [00:29:13] Don't forget to check out our CyberWire Daily News Brief and podcast, along with interviews, our glossary, and more on our website, thecyberwire.com.
Dave Bittner: [00:29:22] The CyberWire Research Saturday is produced by Pratt Street Media. Our coordinating producer is Jennifer Eiben. Editor is John Petrik. Technical editor is Chris Russell. Executive editor is Peter Kilpe. And I'm Dave Bittner. Thanks for listening.
Copyright © 2019 CyberWire, Inc. All rights reserved. Transcripts are created by the CyberWire Editorial staff. Accuracy may vary. Transcripts can be updated or revised in the future. The authoritative record of this program is the audio record.