Private Information accidentally exposed at Microsoft.
By Tim Nodar, CyberWire senior staff writer
Sep 19, 2023

A bucket of AI training data was inadvertently published to a public GitHub repository.

Private Information accidentally exposed at Microsoft.

Researchers at Wiz found that Microsoft’s AI research team accidentally exposed 38 terabytes of private data, including “secrets, private keys, passwords, and over 30,000 internal Microsoft Teams messages.”

AI training data accidentally published to GitHub.

The exposure occurred when a Microsoft employee published a bucket of open-source training data to a public GitHub repository. Users could download the training data via an Azure Storage URL; however, this URL granted permissions to the entire storage account, which included two Microsoft employees’ personal computer backups.

Microsoft says the incident is contained; customers needn’t worry.

Microsoft has fixed the issue, stating, “No customer data was exposed, and no other internal services were put at risk because of this issue. No customer action is required in response to this issue.”

The company explained, “SAS tokens provide a mechanism to restrict access and allow certain clients to connect to specified Azure Storage resources. In this case, a researcher at Microsoft inadvertently included this SAS token in a blob store URL while contributing to open-source AI learning models and provided the URL in a public GitHub repository. There was no security issue or vulnerability within Azure Storage or the SAS token feature. Like other secrets, SAS tokens should be created and managed properly. Additionally, we are making ongoing improvements to further harden the SAS token feature and continue to evaluate the service to bolster our secure-by-default posture.”

AI training data and the inherent risk of over-sharing.

Roger Grimes, Data-Driven Defense Evangelist at KnowBe4, sees training data as carrying an inherent risk. "This is one of the top risks of AI, one of your employees accidentally sharing confidential organization information,” he wrote in emailed comments. The Microsoft incident isn’t, he say, an isolated incident. “ It's happening far more than is being reported. In order to mitigate the risk of it occurring, organizations need to create and publish policies preventing the sharing of organization confidential data with AI and other external sources, and educate users about the risks and how to avoid it. Organizations can also use traditional data leak prevention tools and strategies to look for and prevent accidental AI leaks."