Microsoft's 38TB Data Leak - A Deep Dive into Unsecured Azure Storage

· 3 min read
ferrovial-microsoft-madrid
Ferrovial secures contract for Microsoft Data Center in Madrid / REUTERS/Rami Amichay

In a rather unfortunate turn of events, the Microsoft AI research division had a data mishap on its hands. This all began in July 2020 when they were diligently contributing open-source AI learning models to a public GitHub repository. However, what they didn't intend was to leak sensitive data, totaling in the ballpark of dozens of terabytes.

Fast forward nearly three years, and enter the cybersecurity sleuths over at Wiz, a cloud security firm. It was their sharp eyes that uncovered the leak. They found that a Microsoft employee had, quite unintentionally, shared the URL of a misconfigured Azure Blob storage bucket that held all this sensitive information.

nadella-data-leak-microsoft
Data Leak Challenges for Microsoft CEO Satya Nadella / JASON ALDEN/BLOOMBERG VIA GETTY IMAGES

Now, what caused this whole mess? It was traced back to something called a "Shared Access Signature" (SAS) token. In simpler terms, it's like a magic key that opens all doors. Unfortunately, in this case, it was a little too magical. It gave whoever had it complete control over those shared files.

SAS tokens, when used correctly, are meant to be the guardians of your data kingdom. They let you decide who gets in, what they can do, and for how long. But here's the catch – managing these tokens can be a real headache. Microsoft didn't provide an easy way to keep an eye on them in the Azure portal.

And if that weren't enough, these tokens could be set to last indefinitely. No expiration date in sight! So, are you using Account SAS tokens for sharing outside your secure bubble? Not a good idea, according to Wiz.

Now, here's where it gets even juicier. Wiz's research team stumbled upon something extra – besides the open-source models, there was an internal storage account that inadvertently spilled the beans on a whopping 38TB of additional private data.

This treasure trove included backups of personal info from Microsoft employees – think passwords, secret keys, and a stash of over 30,000 internal Microsoft Teams messages from 359 employees.

Before you panic, Microsoft stepped in with an advisory, assuring everyone that no customer data was caught in the crossfire and their internal services remained unscathed.

Wiz reported this data spillage to Microsoft's Security Response Center (MSRC) on June 22, 2023. The swift response from MSRC led to the SAS token being revoked, slamming the door shut on any external access to the Azure storage account. They managed to quell the issue by June 24, 2023.

microsoft-ai-data-exposure
Microsoft AI Research Team's Data Upload Unintentionally Exposes 38TB of Personal Data / ts2.space

Now, let's talk big picture. AI is all the rage in the tech world, and it's brimming with potential. But as it grows, so does the data it munches on. This calls for more robust security measures, as Wiz's CTO & Cofounder, Ami Luttwak, pointed out. Managing colossal amounts of data, sharing it, and collaborating on projects is a challenging task.

In a twist, BleepingComputer reported a similar incident a year prior. In September 2022, SOCRadar, a threat intelligence firm, uncovered yet another misconfigured Azure Blob Storage bucket, this time linked to Microsoft. It held sensitive data dating back to 2017 and running all the way to August 2022, involving over 65,000 entities from 111 countries.

Microsoft’s Quantum Leap - Building a Supercomputer in Under a Decade
Dive into Microsoft’s groundbreaking roadmap for developing a quantum supercomputer within a decade, poised to reshape the future of computing.

To help folks figure out if their secrets were exposed, SOCRadar introduced a nifty tool called BlueBleed. However, Microsoft had a different take on the scale of the problem, believing that SOCRadar had "greatly exaggerated the scope of this issue" and "the numbers."