Research Saturday 1.26.19
Ep 70 | 1.26.19

Twitter amplification bots and how to detect them.

Transcript

Dave Bittner: [00:00:02] Hello everyone, and welcome to the CyberWire's Research Saturday presented by Juniper Networks. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities, and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.

Dave Bittner: [00:00:25] And now a quick word about our sponsor, Juniper Networks. They're empowering you to automate your security, see your networks, and protect your clouds. Juniper Networks has you covered, so your security teams can finally get back to fortifying your security posture. Learn more at juniper.net/security, or connect with Juniper on Twitter or Facebook. That's juniper.net/security. And we thank Juniper for making it possible to bring you Research Saturday.

Dave Bittner: [00:00:54] And thanks also to our sponsor, Enveil, whose revolutionary ZeroReveal solution closes the last gap in data security: protecting data in use. It's the industry's first and only scalable commercial solution enabling data to remain encrypted throughout the entire processing lifecycle. Imagine being able to analyze, search, and perform calculations on sensitive data - all without ever decrypting anything. All without the risks of theft or inadvertent exposure. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.

Jordan Wright: [00:01:38] So, back in August, we presented a white paper at Black Hat on hunting down Twitter bots at a large scale.

Dave Bittner: [00:01:46] That's Jordan Wright. He's a Principal R&D Engineer at Duo Security. Along with his colleague, Olabode Anise, he's co-author of a research paper titled, "Anatomy of Twitter Bots: Amplification Bots."

Jordan Wright: [00:01:58] So, this research really culminated in our findings in what bots look like, and how we can build a large dataset identifying bots within that dataset accurately and quickly. We also presented a case study showing a cryptocurrency scam botnet measuring in the tens of thousands of bots. So, this really set the tone for the research that we wanted to continue, with this first pass being on more content-generating types of bots.

Jordan Wright: [00:02:24] But then we took it a step further. Back in October, we posted a blog post covering what we call "fake followers." These are Twitter accounts that exist only to artificially inflate another account's following numbers. It makes them appear more popular than they actually are. So we were able to show how we could use very straightforward techniques and heuristics to accurately identify a large network of fake followers.

Jordan Wright: [00:02:50] And then the third type of bot that we wanted to explore - and what our research just this week covers - is what we call "amplification bots." These are bots that we consider even more damaging than fake followers. So, amplification bots work by actively retweeting or liking tweets. Their goal was to make information appear more credible than it is, as well as distribute it to a wide audience of unsuspecting users. So, we say this is more damaging because it's not just about popularity, in this case. It's actively spreading and disseminating information, and making that information appear credible.

Jordan Wright: [00:03:27] So that's where we are today and that's what our study focuses on, is can we accurately identify amplification bots, and taking it a step further, can we build a crawler that can enumerate amplification bots very, very quickly?

Dave Bittner: [00:03:41] All right, well you've all started out here, with the research, kind of trying to establish what was normal. Can you take us through - what was the process there?

Jordan Wright: [00:03:50] Sure. That's a process, anytime we go into identifying bots on Twitter, we first have to ask ourselves, what's the normal behavior? And then we can look for things that we would deem weird. You know, things that we can accurately say this is weird enough to constitute automation.

Jordan Wright: [00:04:06] So, in this case, we took the same data set that we built during our first round of research, which consisted of 576 million tweets, and we asked ourselves, what does the ratio of retweets to likes look like? So, in this research, we only wanted to focus on amplification bots that retweet tweets. We didn't focus on those that liked tweets, because there wasn't API endpoints in place where we could gather that information very easily. So, looking at these ratios, it intuitively makes sense. If you're on Twitter, you're scrolling through, you'll notice that, generally, tweets will have more likes than retweets.

Dave Bittner: [00:04:46] Yeah.

Jordan Wright: [00:04:46] This makes sense, because it's kind of a lower-impact action. I'm not spreading it to my own followers, I'm just acknowledging that tweet. We found the same thing within our dataset. So, in fact, we found that eighty percent of the tweets in our dataset had at least more likes than retweets. So, we can say that it's fairly weird for a tweet to have more retweets than likes, and, again, this intuitively would make sense if you're just scrolling through Twitter.

Dave Bittner: [00:05:11] Right.

Jordan Wright: [00:05:10] So this was one metric that we used to identify if a tweet was being actively amplified. And we were really conservative. You know, our estimation for whether or not it was amplified was, does it have five times as many retweets as it has likes? Because, looking through our dataset, we found that's very, very weird and doesn't happen much at all.

Jordan Wright: [00:05:32] And so that was one heuristic that we used, and we used two others as well. The first is how chronological a user's timeline is. Most users, as they're interacting with Twitter, will have a fairly chronological timeline. I'm going to author tweets and those are going to be in order. The things that I retweet are generally going to be in order. There's going to be the rare scenario where perhaps I stumble across an older tweet from a while back that just caught my attention. But in general, things are going to trend in one direction.

Dave Bittner: [00:06:01] Now, let's just pause there for a second to make sure I understand what you're saying. So, what you're saying is that both the tweets that I generate and the tweets that I'm interacting with will have basically a chronological sequence to them? There won't be a lot of jumping around in time. Is that how you describe it?

Jordan Wright: [00:06:18] That's exactly right. And we call those jumping around in time "inversions." The idea being, if the next tweet in my timeline is - let's say the tweet before was older, and then my tweet is newer in that case. But then if I go back in time, that's an inversion. I went from older to newer to older. And that could happen occasionally for normal timelines, but in the case of bots, we found that happens much, much more frequently. This is because, whenever they get their orders to go and retweet certain numbers of tweets, they could be older tweets, they could be brand new tweets, but it's really all over the map. These inversions - we see a significantly higher number of them for what we would consider bot-like behavior.

Dave Bittner: [00:07:05] So, when you're looking at one of these bots, do you see that, oh, this particular bot has been assigned to start retweeting from this legitimate account? Is it that straightforward?

Jordan Wright: [00:07:16] We see a mixture of accounts that they're retweeting. We of course can see, you know, who they're retweeting...

Dave Bittner: [00:07:21] Right.

Jordan Wright: [00:07:20] ...And that gives us an indication on who they're designed to go and amplify, whose content they're designed to go and amplify. And it may be the case that it's one person, it may be many people from a really diverse set of backgrounds and interests, which is another indication that perhaps this is automated bot-like behavior, because there is no clear trend in the type of content that they're amplifying.

Dave Bittner: [00:07:45] Hmm.

Jordan Wright: [00:07:44] And on that note, that kind of leads into the third heuristic that we used during this research, which is how many tweets in their timeline were retweets. So, the average user - their timeline consists of right around thirty-seven percent retweets. Usually, they're going to have more than that being original content that they author themselves.

Jordan Wright: [00:08:09] So, whenever we want to consider if an account is bot-like, one of those heuristics that we apply is whether or not their timeline consists of ninety-percent or more retweets. This is because the bots that we encounter - that's really all they do. They just retweet content. So, using these three different heuristics - how many tweets in their timeline are retweets, are they amplified having that five-to-one ratio of retweets versus likes, and how many inversions they have - allows us to very accurately identify amplification bot-like behavior.

Dave Bittner: [00:08:45] Now, help me understand - there's a part of this that is a little puzzling to me. So, I can understand someone going out there and paying for followers, so that it appears as though they have a larger following than they have. But with these amplification bots - who's following them that these retweets would matter?

Jordan Wright: [00:09:05] There's really two goals to having tweets being amplified. The first is a wider reach, and to that point, you're right. It's really hit-and-miss, depending on how many legitimate users are following these bots that would, as a result, see this content. So that's going to be kind of a mixed bag, in terms of how effective that is.

Jordan Wright: [00:09:24] But the other goal is to make information seem more credible or popular than it actually is. The goal of this could say, you know, if I'm a content creator, maybe I want to appear more as an influencer. I want to appear that my - the content that I put out has a wider reach than it actually does, for people coming and looking at my profile after the fact, and trying to get that sense of whether or not they should follow me.

Jordan Wright: [00:09:50] And likewise, it also gives a higher incentive or a higher likelihood that a user is going to engage with a tweet. So, bringing this back to the security space, you could imagine that if a tweet has a malicious link in it, and I want this to be broadcast to unsuspecting users, the more retweets that this has, the more legitimate it may seem and the more likely we could assume users would be to engage with that tweet, and to click on the link to that malicious content.

Dave Bittner: [00:10:21] Hmm. Yeah, it's almost like a decoy ducks, right? In a pond, to attract real ducks for a hunter.

Jordan Wright: [00:10:28] Exactly. And really, we saw a great example of this in our initial research. Whenever we studied the cryptocurrency scam botnet, we actively saw accounts being created with the sole purpose of liking these scam-related tweets. So, we see this in action in the security space, and we see this actively being used to promote malicious content.

Dave Bittner: [00:10:49] So, in the research, have some examples, some bots, that you've highlighted here. Can you take us through, describe - what does a typical bot look like?

Jordan Wright: [00:10:57] Sure. So, really it falls down to those three heuristics that we mentioned earlier. We can use one example that we give in the post to really highlight. The first is that, this account that we're looking at, the first tweet on its timeline has roughly 970 retweets and only 164 likes. This is a ratio of almost six to one, which is incredibly rare. Looking across our data set of five hundred and seventy some-odd million tweets, we only saw this occur 0.2 percent of the time. So, if we see the same ratio throughout their entire timeline, that's very, very odd, and highly indicative of foul play, to some extent.

Jordan Wright: [00:11:38] And the other key indicator that we found with this particular bot, is that we were looking for 90 percent or more of their timeline being retweets, but this bot in particular had nothing except retweets. One hundred percent of their content was retweets most of which were highly highly amplified.

Jordan Wright: [00:11:59] So, these two heuristics alone are a good indicator that this is a bot. And then we would look at that third, which is, how many inversions do we see? You know, how often are they jumping around in time? And this account is a perfect example, jumping around very frequently, which is not normal for an average user.

Dave Bittner: [00:12:16] Now, one of the fascinating things you highlighted in your research here is kind of the web of these bots - how they're connected to each other, and how you tracked and traced the various bots that seem to have been teamed up to do this sort of thing. How did you go about that?

Jordan Wright: [00:12:34] You're absolutely right. These bots are connected in some extent, and the reason that this is the case is that they operate in groups. So, one of the things that my partner Olabode and I did was try to determine, how can we map out the entire story, this entire network of bots operating together? So, if I'm a user and I want my content to be amplified, I typically won't try and seek out an individual retweet. I'm going to try and seek out fifty retweets, or maybe a hundred retweets. So this gives an incentive, to where the bot owners will likely flock a set of bots to all retweet the same tweet. This means that we can study this system as a group, because that's how they operate.

Jordan Wright: [00:13:19] So in our case, using the heuristics that we talked about earlier, we were able to build out a crawler that starts with just a single account that's known to be an amplification bot. Looking at the tweets that that amplification bot has retweeted, and then looking through who else has retweeted those tweets, using those heuristics to identify other amplification bots that are all part of that same network, all amplifying the same content, as a group, this coordinated action.

Jordan Wright: [00:13:48] This was really accurate and really effective. Within twenty-four hours, we were able to track down over seven thousand very likely amplification bots, and we just consider this kind of the tip of the iceberg. We are highly confident that, had we let this continue running, we would have continued, over time, mapping out a much larger network of bots.

Dave Bittner: [00:14:10] Now, when you're looking at this, when a tweet gets amplified, when someone engages with someone to do this for them, is that amplification process more or less instantaneous, or do they build in some kind of time delay to try to make things perhaps seem a little more organic?

Jordan Wright: [00:14:26] So, for this research, we didn't really take a hard look at the economy around how these bots operate, but what we can say is, just from doing some spot-checking throughout the research, there's a wide variety on the types of tactics and methods employed by bot owners to try and evade detection. The time aspect is one of those, but it really all falls down to - with the goal being a high number of retweets and the goal being a large reach of information - the retweets have to come at some point. And so, by using these heuristics that are difficult to try and evade - because, like I mentioned, it's hard to get a high retweet count without having a suspicious number of retweets, right? - so using these heuristics and applying them, the time-based aspect of it is something that wouldn't come into play as much to try and evade the crawler that we built.

Dave Bittner: [00:15:17] So, did you notice anything in terms of turnover of these amplification accounts? Do they seem to be running without any risk of them shutting down? I guess I'm asking - do they stay around for a while, or do they seem to be shut down?

Jordan Wright: [00:15:33] That's a great question. One of the things that we were really encouraged to see - as we were building out our crawler, and as we were mapping out these networks of bots - was that Twitter was very proactive in shutting them down. It would be, you know, almost the case that we would come across a bot and very shortly after, that bot would have already been suspended. Which, you know, from a research perspective, is exactly what we're hoping to see when we start engaging and trying to find these bots.

Jordan Wright: [00:15:58] Now, it definitely differs. We found bots that were much older, we found bots that were stood up and began use right away and quickly suspended. So it's kind of all over the map, but as a general trend, we're really encouraged with how proactive Twitter was in shutting them down.

Dave Bittner: [00:16:16] And so, what do you suppose the conclusions that you've drawn here - how do they inform folks who are going about their day-to-day work trying to protect their organizations? What are some of the take-homes for them?

Jordan Wright: [00:16:27] So, it's all about remaining vigilant on Twitter, and just trying to keep a close eye on, especially, whenever it comes to malicious content - kind of looking back at the cryptocurrency scam where we saw this being used quite a bit - if something appears too good to be true, it likely is.

Jordan Wright: [00:16:27] But the other benefit of our research is really focusing on enabling other researchers to take our work, build on it, and improve it in really interesting ways. And we're already seeing that happen, which is really exciting. We're seeing third-party researchers come to us saying, we were able to take your tools and your methodologies and apply it to this particular discipline, or apply it to this particular area, with these really fantastic results. Which is just the very best thing that we can hope for, from a lab perspective.

Jordan Wright: [00:16:43] This is why, with this research, like the "Fake Followers" and like our original "Don't @ Me" study, we're releasing the code that we wrote to detect these amplification bots using a crawler. We're open-sourcing all of it so that other researchers can take, build on, and improve it.

Dave Bittner: [00:17:32] Our thanks to Jordan Wright from Duo Security for joining us. Along with his colleague, Olabode Anise, he's co-author of the report, "Anatomy of Twitter Bots" Amplification Bots." That's on the Duo website. We'll have a link in the show notes.

Dave Bittner: [00:17:47] Thanks to Juniper Networks for sponsoring our show. You can learn more at juniper.net/security, or connect with them on Twitter or Facebook.

Dave Bittner: [00:17:56] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security at enveil.com.

Dave Bittner: [00:18:05] The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technology. The coordinating producer is Jennifer Eiben. Editor is John Petrik. Technical editor is Chris Russell. Executive editor is Peter Kilpe. And I'm Dave Bittner. Thanks for listening.