Detecting dating profile fraud
Dave Bittner: [00:00:03] Hello everyone, and welcome to the CyberWire's Research Saturday, presented by Juniper Networks. I'm Dave Bittner, and this is our weekly conversation with researchers and analysts tracking down threats and vulnerabilities and solving some of the hard problems of protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Dave Bittner: [00:00:26] And now a word about our sponsor, Juniper Networks. Organizations are constantly evolving and increasingly turning to multicloud to transform IT. Juniper's connected security gives organizations the ability to safeguard users, applications, and infrastructure by extending security to all points of connection across the network. Helping defend you against advanced threats, Juniper's connected security is also open, so you can build on the security solutions and infrastructure you already have, secure your entire business, from your endpoints to your edge, and every cloud in between with Juniper's connected security. Connect with Juniper on Twitter or Facebook. And we thank Juniper for making it possible to bring you Research Saturday.
Dave Bittner: [00:01:13] And thanks also to our sponsor, Enveil, whose revolutionary ZeroReveal solution closes the last gap in data security: protecting data in use. It's the industry's first and only scalable commercial solution, enabling data to remain encrypted throughout the entire processing lifecycle. Imagine being able to analyze, search, and perform calculations on sensitive data, all without ever decrypting anything - all without the risks of theft or inadvertent exposure. What was once only theoretical is now possible with Enveil. Learn more at enveil.com.
Awais Rashid: [00:01:53] I have a longstanding interest in protecting citizens online.
Dave Bittner: [00:01:58] That's Professor Awais Rashid from the University of Bristol. The research we're discussing today is titled, "Automatically Dismantling Online Dating Fraud."
Awais Rashid: [00:02:07] We have worked for many years - well over twelve to fifteen years - on protecting people online, such as children, as well as developing technologies to help law enforcement tackling those issues. This particular project came about because there was a colleague in psychology who was then at the University of Warwick, now based in Melbourne, and she was studying romance scams from a psychological perspective. And given our expertise in looking at more computational approaches to automatically detect attempts to victimize online, there was a common ground. So we started to look at this as a problem and then also collaborated with researchers who were then at University College London.
Awais Rashid: [00:02:49] The general premise here was that romance scam is one of the perhaps more underreported problems, in some cases. So if you look at the some of the recent data from the FBI, it notes that there is a total loss of eighty-five million dollars through online romance scams in the US. The numbers could be higher, because there is often quite a lot of stigma with people wanting to admit that they have been taken in by this kind of scam. And, you know, of course, they might be very traumatized to even be willing to share it.
Dave Bittner: [00:03:20] So, how do online sites - up to this point, how do they combat this sort of thing?
Awais Rashid: [00:03:27] So online dating platforms, in fairness, do quite a bit of work to protect the users that are on there, but largely the techniques are manual. So, you know, there are banks of human moderators whose job it is to identify when a scam profile might be created and attempted on the dating site. Of course, if people report a particular activity to the dating site, then they act on it.
Awais Rashid: [00:03:52] But the key thing here is that the first step in an online romance scammer's task is to create a fake dating profile and get it onto a dating site. So in order to protect people, the best thing to do is to actually try and stop it at that point in time, and that's what dating sites do. But because there is a huge volume of profiles that are created, scammers are getting increasingly sophisticated. And while some of the indicators like using stolen credit cards and so on, are very easy to spot, in some cases, scam profiles are harder to detect. For also humans, from our perspective, it is meant as an aid to the work that human moderators do, perhaps moving their attention to other activities so that they can really look at the more sophisticated profiles that might required a lot more human intervention, and some of the sort of stuff that can easily be called through automation can be caught fairly quickly.
Dave Bittner: [00:04:46] What was the goal here of your research? What were you hoping to find here?
Awais Rashid: [00:04:51] So, our goal was to see if we can develop AI and machine learning-based techniques that can automatically detect romance scammer profiles with a very high degree of accuracy. And the idea being that as people are creating profiles, the system works so that they can automatically flag potentially a human moderator that something is suspect, or, you know, depending on how the system is tuned, automatically even reject a profile. Of course, there is a balance here, because while the tools are highly accurate, they are not a hundred percent accurate, as is the nature of machine learning and AI techniques at the moment. So there has to be some kind of human intervention because, of course, you know, we don't want to deny regular users who want to use online dating sites from participating in these platforms.
Dave Bittner: [00:05:37] So, let's walk through together. There are a number of, I guess, common attributes that are a part of a dating profile. Let's go through those together and talk about how you broke those out and how you analyze them.
Awais Rashid: [00:05:53] So, we look at a number of indicators within the profiles. So we look at what are the demographics information that a dating profile would have. We also look at the description that the profiles provide about themselves. There is always a bit of textual fields to say something about oneself on the profile. And then there is of course imagery of where people will share images of themselves to show the kind of person that they are. And it's quite interesting that if we look at some of these various indicators, then we can see the system can detect quite a few patterns which may not immediately be obvious if we look at it just with the naked eye, so to speak.
Awais Rashid: [00:06:32] So if we, for example, compare the frequency with which so we had a dataset from a dating site that has shared manually verified scam profiles online. So we have that dataset and we can compare it with real dating profiles. And it's quite interesting to see that, in real profiles, we don't see the same frequency with which some scammers claim to be their profession. So scammers, for example, very often will claim to be in the military profession, engineers, a business, for instance.
Awais Rashid: [00:07:05] And all these - if you then now think about it logically - all these are used to create a story. So, military, you know, the good old sort of, you know, American G.I. abroad scam, you know, either the luggage is, you know, stuck somewhere and they need money to then move it across customs, or they have been injured, or things like that. Again, engineer, business, you know, it's sort of a shows a successful profile, but also there is always a reason why the scammer can't meet or why they're always travelling, and so on. And that's for male profiles.
Awais Rashid: [00:07:35] But if you look at female profiles, it's a very different setup in the sense that often you will see professions such as student, carer, military also does make an appearance, sales. But all of these are there to then create a story follow on. So as a student, you know, potentially run out of money. You know, as a carer, you may need money to help care for someone you're caring about. And that is not to say that we don't see these in a in real profiles. These professions also exist in real profiles, and of course, you know, people have all sorts of lives. But the frequency with which they appear in scam profiles is much higher than they appear in the real profiles - male or female.
Awais Rashid: [00:08:14] And there is also other interesting patterns that we see that, you know, actually scammers overshare, in terms of images. So, if you are not a scammer, you will share fewer images in the dataset on average compared to scammers, where they will share almost twice as many images. And it is perhaps a way for them to sort of, you know, try and portray that they have a sort of a very interesting life and lots and lots of interest and to attract potential victims.
Dave Bittner: [00:08:43] Now, you looked into the images themselves and found some pretty interesting patterns there as well.
Awais Rashid: [00:08:50] Yes, so, we can - so, our classifiers can automatically classify the images based on what is the content of the images. But also, if you look at some of the sort of fake vs. real images in the scientific paper, there are some interesting examples. So you can see that there are often scammers would take real people's images from online sites and then they change that. They will alter them, so they will superimpose someone else's head on.
Awais Rashid: [00:09:17] And there were two that are really interesting. And in one is, you know, in the original there is a man on a hospital bed and has an injured leg. But in the fake image, which obviously at some point a scammer has shared with the victim to say, look, I'm injured and I'm in hospital and I need money, it's just the same image, but it has a woman's head imposed on it. And if you look at it cursorily, you know, on us, perhaps a smaller screen. It's quite easy to miss. But if you sort of, you know, blow it up or if you can automatically analyse it, then you can see that actually it has been doctored.
Awais Rashid: [00:09:48] And talking about doctors, the other one, which is quite interesting, is that one of the scam images shows a scammer, you know, with his friends as a doctor, with sort of nurses and doctors in a hospital. But actually, the image is taken from a TV show. So they've basically just replaced the head of the lead actor with someone else's head - may or may not be a real - the scammer's own imagery, unlikely to be their own imagery. But, you know, and again, if you don't know - and it says show based in the UK, I don't know how popular it is in North America, and if you have not seen it, you know, it might look perfectly legitimate to a user.
Dave Bittner: [00:10:28] Mm-hmm.
Awais Rashid: [00:10:28] But if you have seen the show, then it is quite obvious. So there are these kinds of patterns which we can see. But, you know, they look sort of, you know, legitimate to, you know, sort of at a quick glance. And when people are sort of glancing through these things, it is quite easy to miss some of these things.
Dave Bittner: [00:10:45] Now, you did work of using automation to categorize these images. Did you also do work of reverse image searches? You know, looking for, for example, like, stock images that people pretended to be profile images?
Awais Rashid: [00:11:00] No, so we didn't particularly look at that, but there is other work which looks at doing this kind of reverse image search and we can easily utilize these kind of techniques. But our focus was specifically looked at can we actually classify as something comes in fairly quickly, as to whether it is a real or a fake profile.
Dave Bittner: [00:11:22] And what did that reveal - the classifications that you assigned to things? What was the result of that work?
Awais Rashid: [00:11:27] We analyzed basically the demographic data. We analyzed the images based on the kind of content and features they're conveying. And we analyzed the profile descriptions using textual analysis and language analysis techniques. All of these use different types of machine learning technologies. So we use a combination of structured, unstructured, as well as deep learning techniques and then we combined the outputs of the different classifiers, and we tested kind of various functions to find what was the optimal one. And we can, with a ninety-seven percent accuracy, detect that, you know, an incoming profile is a scammer profile. But of course, as I said earlier on, you know, it's not a hundred percent accurate. So there are, you know, of course, false positives in that regard.
Dave Bittner: [00:12:15] But I could really imagine for the real live human beings who are assigned to sort through these things, if they're only tasked with then having to really take a close look at three percent of them, that really lightens the load for them and allows them to be more accurate there.
Awais Rashid: [00:12:32] Absolutely. And I think the key here is to help the human moderators by reducing the workload on them. Again, you know, the scammers will keep trying. So, you know, if a moderator would sort of reject the profile, it doesn't really take them very long to create another one, and another one, and another one. And so, because ultimately, you know, some of these things are automated in terms of trying to sign up and things like that. So there's lots of different techniques that scammers also use to get themselves onto the dating site. So the more we can help the human moderators, the better it is.
Awais Rashid: [00:13:06] There is also potentially other applications, in the sense that, you know, one can envisage this being a browser plugin that users use on their own, and they can use it to sort of see if the profiles that they are viewing potentially scammer profiles. But there are, of course, sort of other issues there because one needs to be very careful because, you know, romance is a very, very personal interaction, and one needs to be careful that, you know, people might start to sort of completely believe an AI technique and what we'd not want to do is to accuse, you know, perfectly legitimate users of being scammers.
Dave Bittner: [00:13:43] Right, yeah. What if I actually am a military handsome man overseas who is looking for a relationship? (Laughs)
Awais Rashid: [00:13:52] Absolutely, and unfortunately, you know, with the high profile of that particular type of scam, it's a hard job at the moment....
Dave Bittner: [00:13:59] Right.
Awais Rashid: [00:14:00] ...In that regard, to convince people that you are for real. But there are often telltale signs, you know, and these kind of tools are one thing in our set of tools to try and detect and prevent online romance scams. But, you know, of course, with the tools not being hundred percent accurate, you know, some of the profiles will get through. Human moderators, are ultimately, you know, human, and may miss profiles. But there are telltale signs that users can use to protect themselves. You know, there are typical tactics. Scammers will try to take people off the dating site quickly. Now, again, dating sites are very unique in that sense, that they are designed for strangers to talk to strangers. OK.
Dave Bittner: [00:14:42] Hmm.
Awais Rashid: [00:14:41] So you are effectively, by signing up to a dating site, you are saying you are actually happy to receive unsolicited messages, because that's how people get in touch with you. And then once, if you've found someone you are interested in, then people do not really want to communicate through the dating sites, you know, initially they might do a little bit of communication. But, you know, ultimately they would want to sort of talk to each other directly. And scammers utilize that as a tactic to get people quickly off the dating site, because now there is no trail of what's going on in any shape or form.
Awais Rashid: [00:15:13] But there are other telltale signs. If they are always unable to meet, if they are starting to ask for money. And some of the profiles that we looked at, there are very specific kinds of language. So, they often appeal to people's idealized sense of romance. So, the work that our colleague in the project, Monica Whitty, she did from a psychological perspective, shows that people who fall victim to these kind of scams have often an idealized notion of what romance is, that there is one person for everyone. And if you look at some of the fake profiles, they play on that.
Awais Rashid: [00:15:48] So, one example that we have is someone saying, you know, I'm actually a widower. Many a times scammers would pretend to be widowers as well. And they say, ever since my wife passed away, I've been celibate, you know, there was one person for me. And now I have found this kind of new person, and I want to grow old with you, and so on and so forth. And things like that. So they appeal to that kind of idealized notion of romance. And there is those kind of signs as well.
Awais Rashid: [00:16:15] There is also, you know, often they're very kind of forceful - I don't want to say say forceful - I guess, sort of very exaggerating in the way they want to sort of engage with people. You know, so the female profile talking about having a great friendship, you know, and they will sort of encourage people to get in touch with personal email or personal messaging apps so that they can take them off the website.
Dave Bittner: [00:16:39] Now, one of the things you looked into that I found fascinating was you use natural language processing to actually analyze the number of words and the types of words that these scammers were using, and you found some real differences between the types of things that real people say and the things that the scammers say.
Awais Rashid: [00:16:59] Yes, and that's actually a good question, because similar to people oversharing in terms of images, scammers also write more. So the average number of words that scammers would use in their profiles is well over one hundred, while users would normally write more brief and pithy description of themselves, you know, around about fifty words less than then scammers in general.
Awais Rashid: [00:17:25] Also scammers often considerably more refer to emotions, both positive and negative. So, they try to evoke this emotive response from the users. They will appeal to, often, sense of family, friendship, and provide sort of a certain sense of certainty. And if you contrast that with real users, the real users tend to often, for example, focus more on their motives and drivers. You know, they would talk about work, leisure, and those kind of things.
Awais Rashid: [00:17:53] And also, the other interesting thing is that scammers use more formal language forms, while genuine users will display more informal language such as, you know, netspeak or short messaging speak. Scammers tend to often be more formalized, and it may often be that they're perhaps based outside the Western countries. Most of the scammers would target North America, but also Europe, and so on. So perhaps they're not used to the vernacular of the particular country that they're targeting.
Dave Bittner: [00:18:23] Can you give us an overview of what was the tech that you were using under the hood to combine this data that you gathered and then come up with your conclusions and end up with this high degree of accuracy?
Awais Rashid: [00:18:37] If you think at a very high level, how the process works is that we take in a broad description of our profile. This comes from a dating site and their associated published scammer profiles that they have done. From there, we look at three different elements. We look at demographics, which is effectively occupation, gender, age, any other sort of data that is in the form fields. We look at images themselves. And then we look at the descriptions in terms of the profiles, which is raw text.
Awais Rashid: [00:19:06] Each of these is then cleaned up. So, we would normalize the data. We take, for example, if someone has said their occupation is a housewife, then we would normalize it to "home" and "wife." And in images, we can classify things like, you know, there is a woman in a dress, man playing rugby, and so on and so forth. And we were earlier talking about sort of, you know, profile descriptions, and there we would analyze all the textual features by using natural language processing techniques.
Awais Rashid: [00:19:34] And then we extract the various features from these. So there are grammatical features, there are particular categories that people utilize. And for natural language processing, we found that n-grams - so, portions of words based on three or four letters - were good indicators. We then feed them into a range of different classifiers and then combined the outputs of the classifiers using a function, which we then test different weightings as to what works.
Dave Bittner: [00:20:04] How much adjustment and tweaking did you do along the way to come to the degree of accuracy that you have here?
Awais Rashid: [00:20:12] So that's the challenge with a machine learning and AI techniques, that you have to sort of see which techniques work well. So, we tried, of course, a range of different classifiers to see which one would perform better on particular types of data. We had some ideas, but we, of course, have to try and see which ones perform better for different types of data and then evaluate them on different types of test sets.
Awais Rashid: [00:20:40] And once we have done that for each of the three different subcategories - so, the profile description, that image, and the demographics - then once we knew which ones were the classifiers we wanted to utilize there, then we train an ensemble classifier, and then we test that as to how accurate that ensemble is going to be. So there is quite a lot of testing that goes on to reach a degree of confidence.
Awais Rashid: [00:21:07] Now, I have to give the caveat that while the profiles we use are fairly representative of the profiles that are used in platforms in industry, but the profiles we use are from one particular platform that is open. And the reason we use them is that they have also made available a big dataset - a reasonably sized dataset, I should say - of scam profiles. So we have a direct comparison. We know which profiles are fake, and we also know which profiles are real on the assumption that the human moderators miss nothing.
Awais Rashid: [00:21:40] So, generalizing the results beyond the dataset at this point in time would not be scientifically valid. Naturally, larger trials are needed on a number of other dating platforms and the kind of profiles that they see, to test if the classifieds we have trained work as effectively, what are the limitations of the classifiers, would more work fine tuning be required before we can have confidence that this would work on a very, very large scale? This is effectively a fundamental early stage research that needs much more further validation.
Dave Bittner: [00:22:16] So where would you like to see it go next? Would you like other folks to build off of the research you've done here?
Awais Rashid: [00:22:22] Absolutely. So, our tools are publicly available. So the source code is publicly available. We haven't shared the data, for the simple ethical reason that because we use also - while publicly available - real users' profiles as well. We don't store any of that data, on the basis that if users withdrew their profile from the platform, then it shouldn't live on in our data set. So the only thing that we share is as to how to collect the data, the scripts that we used so that can be validated. And yes, of course. The idea here is really that others are welcome to take these on and build more. We are ourselves, of course, interested in working with other platforms to do larger scale trials.
Awais Rashid: [00:23:07] As I mentioned earlier, we are also discussing as to the feasibility of perhaps a browser plugin that users can use themselves. But the key is how do we communicate that, the outcomes, to the user so that they don't attach - they are aware that the tools are not a hundred percent accurate and hence don't attach complete confidence to them, but equally do not completely ignore them.
Awais Rashid: [00:23:30] And then there is this fundamental question should we take it just a better safe than sorry approach and just say, even if there is a small false positive range, we should just accept that it's better to be safer than sorry. And those are the more sort of fundamental research questions that really need to be investigated before we can sort of, you know, have more insights into how effective these techniques are on a large scale.
Dave Bittner: [00:23:59] Our thanks to Professor Awais Rashid from University of Bristol. The research is titled, "Automatically Dismantling Online Dating Fraud." We'll have a link in the show notes.
Dave Bittner: [00:24:09] Thanks to Juniper Networks for sponsoring our show. You can learn more at juniper.net/security, or connect with them on Twitter or Facebook.
Dave Bittner: [00:24:19] And thanks to Enveil for their sponsorship. You can find out how they're closing the last gap in data security at enveil.com.
Dave Bittner: [00:24:27] The CyberWire Research Saturday is proudly produced in Maryland out of the startup studios of DataTribe, where they're co-building the next generation of cybersecurity teams and technology. The coordinating producer is Jennifer Eiben. Our amazing CyberWire team is Stefan Vaziri, Tamika Smith, Kelsea Bond, Tim Nodar, Joe Carrigan, Carole Theriault, Nick Veliky, Bennett Moe, Chris Russell, John Petrik, Peter Kilpe, and I'm Dave Bittner. Thanks for listening.