At a glance.
- Alibaba data scraped.
- CVS Health customer data leaked.
- Digital ID technology and privacy.
Data scraper hits Alibaba Group.
The Wall Street Journal reports that Alibaba Group’s Taobao retail website has experienced a massive data scraping incident. A court verdict released this month in China’s Henan province explains that a software developer was covertly scraping data from the site for eight months, gathering more than 1.1 billion pieces of user information before Alibaba detected the intrusion. The scraped data include user IDs, customer comments, and mobile phone numbers, which are legally deemed personal data in China and could be used by a threat actor to access more sensitive accounts. The developer was delivering the data to his employer, who used the information to target clients for a promotions company. Each was sentenced to more than three years in prison.
We received a number of comments from industry experts on the incident. Jorge Orchilles, CTO of SCYTHE, noted the difficulty of detecting automated data scraping:
"Automated scrapping of web site data is difficult to detect when it is done slowly over a long period of time as appears to be the case here. Most high volume sites can limit extremely fast (non human speed) requests by monitoring the application layer traffic with a variety of tools including web application firewalls and CDN providers. It appears some of the data was not publicly accessible so the attack may have leveraged valid credentials or SQL injection."
David Stewart, CEO, Approov, thinks the evidence suggests that an API may have been the way the scraper found its way in:
"It's hard to say exactly how the scraping was done but it seems likely that the API was the route into the data and most probably a BOLA (Broken Object Level Authorization) vulnerability was exploited to access it. Recent security research into mHeath apps and APIs disclosed similar issues. The key lesson is understanding the importance of ensuring that the user getting the data is really authorized to do so. Vulnerabilities like this are hard to track down, and while enterprises are doing so it is good practice to shield APIs so that scripts intent on data scraping - or worse - are blocked."
Saryu Nayyar, CEO of Gurucul, is concerned by the sheer size of the attack and the length of time it went on:
“Two things about this breach are concerning. First, 1.1 billion users is an ENORMOUS number! So many Chinese mobile phone numbers are now at risk of being used to commit vishing and texting schemes, as well as potential identity theft when paired with the user's real name identification. Second, the attacker had been collecting data for eight months before Alibaba noticed. Eight months is an eternity in cyber space, and accounts for the software developer's ability to gather that many mobile phone numbers. As always, cyber defenses should be deployed that are able to discover anomalous activity in real-time and prevent attackers from compromising your data.”
Chris Clements, VP of Solutions Architecture at Cerberus Sentinel, regrets the new normal of having to assume your data are freely circulating wherever they may:
“It’s unfortunate that we’ve basically come to the point where you more or less have to assume that all information you share online will either be leaked, stolen, or purposefully sold to third parties without your knowledge. Privacy regulations like GDPR can have some effect in preventing organizations from misusing your information but they are largely toothless in preventing information from being stolen by cybercriminals. Even breach notification laws rely on the premise that an organization will know when they have suffered an attack that discloses personal data. Most organizations only find this out if and when they are contacted by a third party, usually security researchers or law enforcement that has noticed data that appears to belong to them for sale on the dark web.
"To protect themselves and their customers, organizations must adopt a true culture of security that prioritizes the safeguarding of systems and data with a similar seriousness as the approach to personnel safety. This includes critical components like security education, secure software development lifecycles along with system and application hardening, regular penetration testing to identify potential risks and finally continuous monitoring for suspicious activity coupled with proactive threat hunting.”
James McQuiggan, Security Awareness Advocate at KnowBe4, is also struck by the duration of the incident:
"When organizations discover they have been breached, it is usually determined that the cyber criminals were accessing data for a significant length of time.
"Organizations should focus on protections if the cyber criminals are already in the network instead of reacting after the breach; especially as this relates to technology and processes in place to secure and protect sensitive information like names, email addresses and phone numbers.
"A software developer may have already had access to the website or via a third-party site, which is a common attack vector for cyber criminals to leverage the supply chain for the website to gain access."
Potential privacy risks of digital ID technology.
Throughout the pandemic, the issue of digital vaccine passports has been met with much debate. And now, CyberScoop reports, with the end of the pandemic within reach some tech companies are exploring ways to expand the technology to encompass other identifying data. For instance, IBM is working to expand New York state’s Excelsior Pass to include age and driver’s licenses, and Apple recently launched technology that will allow users to present their IDs digitally to the Transportation Security Administration. Proponents say digital IDs are a safer means of verifying identity without exposing more sensitive information like Social Security numbers. But some privacy experts argue the pandemic has allowed these technologies to evolve without the necessary security regulations, and often details about how the technology functions are hidden behind proprietary patents. According to Alexis Hancock, director of engineering at the Electronic Frontier Foundation, “It doesn’t matter how many promises the company puts out, or how often they may claim that they’re doing the safest thing. With people’s data, there’s no federal accountability.”
CVS leaks customer data.
American pharmacy CVS accidentally exposed customer information in an unsecured database, Forbes reports. Discovered by researcher Jeremiah Fowler of Website Planet, the database contained more than 1 billion data points including searches conducted by users on CVS.com for medications and COVID-19 vaccines. Though names were not included in the records, the researchers were able to connect some individuals to their data using email addresses accidentally entered into the search bar. As Fowler explains, “Organizations collect this valuable data and use this information for analytics, customer management, or marketing needs. At the same time consumers want privacy and to have more control over their data and how companies or social media providers use that data.” Though CVS acted swiftly to secure the data, the incident illustrates how web tracking of even non-personal data can present a privacy risk.
PJ Norris, senior systems engineer at cybersecurity company Tripwire, commented on the lamentably high rate of misconfigurations we're seeing:
"Misconfigurations like these are becoming all too common. Exposing sensitive data doesn’t require a sophisticated vulnerability, and the rapid growth of cloud-based data storage has exposed weaknesses in processes that leave data available to anyone. A misconfigured database on an internal network might not be noticed, and if noticed, might not go public, but the stakes are higher when your data storage is directly connected to the Internet. Organizations should identify processes for securely configuring all systems, including cloud-based storage, like Elasticsearch and Amazon S3. Once a process is in place, the systems must be monitored for changes to their configurations. These are solvable problems, and tools exist today to help.”
Ray Canzanese, threat research director at Netskope, recommends that organizations scan their cloud environments to discover exposed resources:
“This appears to have been an unprotected Elasticsearch server that was exposed to the internet. Improperly configured security groups, nacls, and firewall rules is a common type of exposure in IaaS providers like AWS, Azure, and GCP. We have recently performed a study of public exposure of compute infrastructure in IaaS environments across the three major IaaS providers that indicated >35% of compute instances expose at least one service to the Internet.
"Things you can do to avoid such exposures include scanning your own cloud environments automatically to discover and lock down exposed resources. ZTNA products also provide a means to give employees secure access to cloud resources, whether they are hosted on-prem or in the cloud, without exposing them to the internet.”
David Pickett, senior cybersecurity analyst at Zix I AppRiver, draws a lesson about data protection:
"The exposure of over a billion records belonging to CVS Health highlights the importance of protecting sensitive customer information as well as ensuring your organization and any third-party vendors who have been brought on to help with security and cloud migration have proper security measures in place. Companies that house personal information for millions of customers need to reflect on their current password practices and ensure they are building the safest habits to protect their company and customers from cybercriminals. In this case, the database was not protected by a password and had no authentication requirements. Implementing two-factor authentication (2FA) or a multi-factor authentication (MFA) protection approach provides an extra layer of security by making users confirm their identity, most often via a unique code sent to the user's phone, email address or through an authenticator app, after entering their username and password. It’s getting easier for cybercriminals to breach even the most complex password, which is why implementing 2FA is critical. Another component to be mindful of when working with third-party vendors that have access to company data is reviewing and understanding what the vendor agreement encompasses for security practices. These solutions will help to prevent companies from becoming another statistic in a long list of companies who have had data exposed online.”
And Daniel Markuson, digital privacy expert at NordVPN, sees this sort of incident as unusually scary during a time of pandemic:
“The coronavirus pandemic has created the ideal environment for bad actors to prey on people’s fears and vulnerabilities during this period of uncertainty. With vaccine data in their hands fraudsters will take hold of every channel available, including email, automated voice messages, texts, ad banners, social media, direct mail, robo calls, to get in touch and try to capitalize on it. So people need to take every security step they can to protect their health information, including using a VPN for all health communications.”