Another week, another massive corporate data leak. Last week, Decathlon disclosed that 123 million records were exposed to the public as a result of a misconfigured data warehouse. The data warehouse was discovered by Noam Rotem and Ran Locar of VPNMentor. The personally identifiable information (PII) of both customers and employees were unencrypted and made available for the entire public to access online. Exposed information included employee usernames, clear text passwords, social security numbers, full names, addresses, mobile phone numbers and birth dates, as well as customer email and login details.
It’s become clear to us at Satori that today’s model for data security is completely inadequate for the cloud
Decathlon is far from the first company to suffer from such a misconfiguration and they certainly won’t be the last. In fact, Satori’s research team has discovered that the ingredients for another leak of this scale exist in nearly ten thousand other companies worldwide that use similar internet-accessible databases.
Considering the public relation and regulatory fallout from these kinds of data leaks, many find themselves asking what, if anything, is being done to prevent them. Many are also wondering why they’re still happening at all. As someone that has lived and breathed data security for 15 years, I have a few thoughts. First, let’s do away with a few assumptions floating around.
Are companies simply disregarding the importance of keeping their big data repositories secure? I don’t think that’s the case.
Are companies failing to understand that their cloud configuration is an important part of their defense? Surely we’re past that point.
Do companies struggle to appreciate that cloud development models demand their own unique requirements and constraints on security? On this point, I’ve seen the security industry increasingly adopt this ethos, and many vendors have begun to roll-out incredible solutions tailored to the specific needs of cloud security.
So what’s missing? It’s become clear to us at Satori that today’s model for data security is completely inadequate for the cloud.
For years, data has been couched in layers of security, from network security to application security, end-point security to anomaly detection. This approach ensured that gaps were more or less covered and significantly limited the real threat of a data leak. Unfortunately, this layered security approach has failed to be implemented as companies migrate to the cloud—and nothing else has taken its place.
Unfortunately, this layered security approach has failed to be implemented as companies migrate to the cloud—and nothing else has taken its place
There’s a saying in aviation to always “fly two mistakes high”. It means that you should never put yourself in a position where one mistake can take you down. This is exactly what layered approaches help security teams achieve and precisely why today’s approach to data lake and data warehouse security on the cloud is doomed to fail. Relying on cloud configuration management alone cannot keep companies safe from data leaks and is many steps short of keeping big data stores safe. It is enough for one employee to replicate a VM housing sensitive data to an environment that is not configured to hold it to bring the whole plane down
The challenge with cloud configuration management is that it directly ties to the parameters of your cloud deployment—the services you run, the instances you deploy and your environment. These are all very dynamic and perfectly representative of how engineering teams constantly evolve software and services. While a company should naturally aim to be on top of that, having such a dynamic and volatile last line of defense exposes them to an unacceptable degree of both unintentional and unpredicted risks.
This begs the question: what should the last line of defense of data security look like on the cloud?:
- First, it must be isolated from environment changes—if it isn’t, you can’t be reactive to new changes and are dangerously exposed to the risk of mistakes and slow response.
- Second, it must be simple to configure and enforce—otherwise, you end up with the same configuration challenges you have for your cloud configuration and will find yourself back at square one.
- Next, it must be transparent in the environment—without transparency, friction will push people to bypass it in order to get their job done.
- Finally, it must be universal, running on any environment which ensures that it can be deployed across different cloud providers and environments.
Cloud configuration management, while necessary and impactful, isn’t any one of these things, making it an ineffective last line of defense.
If we want to see less stories about data breaches making headlines and filling our news feeds, we have to move beyond hardening cloud configuration and policy and towards adopting a more holistic approach to data governance instead. What is more, we must confront the incentive to find workarounds for data access restrictions, the biggest contributor to duplicating data into uncontrolled environments, and remove them.
Satori helps organizations control data access and achieve a high level of data protection and governance by providing a policy engine that authorizes data consumers to data. Satori is universal in the sense that it is not tied to a specific environment and is thereby sensitive to changes that take place in it. This means that it works for both new data stores and data that’s being moved between environments. Satori’s integration has no impact on data consumers, is transparent and doesn’t require architecture or data pipeline changes. Any data access through Satori is authorized based on company policy regardless of the environment configuration, meaning that sensitive data cannot leak—even if there are configuration errors.