As the world undergoes a data revolution, the ways we generate, store, access, use and analyze data are transforming before our very eyes. In just a few years, we’ve added sensors to transmit data in high resolution, developed incredible tools and algorithms to help us make sense of it and have firmly placed data-driven strategies at the center of our enterprises.
This has all been for the better. Data transformation is dramatically changing how we live and has exponentially broadened the toolkits that organizations are employing to make better decisions and operate more effectively. As a result, we’re enjoying an era of unprecedented innovation and productivity with such vast implications that they’ve even improved our very quality of life.
However, this progress, and the push for data-driven innovation that’s inspired it, haven’t touched every part of our relationship with data yet. Most of us haven’t been as quick to appreciate the changing risks that stem from our new datascapes and the majority of digital transformers continue to adhere to standards and methods of data governance and protection that no longer apply. This has been especially true for data access controls we employ today, and the consequences that have arisen from outmoded data access management have already made themselves well known across global headlines.
Scoping out the problem
Let’s have a quick Big Data recap and explore how ‘big’ Big Data really is and the different ways it’s scaled.
Big Data is characterized by four ‘V’s:
- Volume: There’s a lot of data (as in we measure our stores in Petabytes)
- Velocity: The data is added and updated frequently
- Variety: The data is comprised of many different types of data that is often unstructured.
- Veracity: The data is not always “good”, meaning ready, and often varies in quality
These V’s are more or less accommodated with the use of “data lakes”, which can store large amounts of varying types of data with relatively little effort. Very few changes need to be made to data stored in data lakes and they’re convenient repositories for analysts to work with. However, where data lakes have expanded the potential of data and ways it can be leveraged, they also pose significant challenges to the way we can protect access to that data. As a result, data lake access control has become one of the biggest obstacles to securing big data lakes today.
Let’s dive into the adventures of the ACME Corporation to understand why that is:
Dr. Anna Leetics, a mathematics PhD and exceptional data scientist, has just landed a job at ACME Corporation. Hurray! She joins one of the data science teams and the company is excited to get her started on crunching data and driving innovation for the company. She’s waiting for ACME to grant her data access permissions so that she can work her magic.
ACME, a global e-commerce organization, employs teams of data scientists dedicated to studying the vast amounts of data collected from product telemetry, sales data, and other sources, to increase the efficiency of the business. This data is coming in from their different systems (Such as: ERP, marketing tools, product telemetry and more), as well as from vendors and financial services. Some of it is structured in tables, some of it is only semi-structured, and some isn’t structured at all. Their data lake is spread across data centers around the world as well as across native public cloud solutions.
A portion of the data they process is considered to be “sensitive data”. Data can be classified as sensitive for many reasons, often because it either contains Personal Identifiable Information (PII) or because it contains data about customers from a certain region. Due to the variety and veracity of ACME’s data, it’s very hard for ACME to discern exactly which parts are sensitive and in what way it may be classified as sensitive (does it contain PII? Payment Card Information? Healthcare Information?). In many cases, their data is also mixed, meaning that sensitive data sits in the same place as non-sensitive data.
This is a good time to take a pause and really appreciate how complicated Dr. Leetics’ onboarding is. Consider the multitude of technologies and the sensitive information across different locations that ACME must take into account in order to create effective and compliant data access controls for their new employee.
Some additional food for thought:
- Dr. Leetics is based in Boston and is contending with regulations that prohibit her from viewing the PII of customers from certain countries.
- Information security would not be happy with her being exposed to certain types of information, such as hashes of passwords and password reminder questions.
- ACME would like to be notified in any instance of suspicious behavior, such as any instance in which Dr. Leetics suddenly withdraws large amounts of data outside of the scope of her team.
The company would like to install all of these data access controls (and more) but finds the current process too complicated. First, their security teams must scan through their entire data lake and data warehouses, meaning petabytes of different data, and manually select what she can and can’t have access to. Then, Dr. Leetics must open a support ticket whenever she lacks access to important data. It’s an incredibly inefficient and highly ineffective process.
As if this weren’t difficult enough, ACME isn’t even clear on where the company should even place its data access controls. In many cases, available data access control tools aren’t granular enough (i.e. when contending with mixed data in public cloud data lakes). Moreover, managing the data access controls of all of ACME’s different systems would slow down the business immensely.
Finally, just to add fuel to the fire, a cool new data processing technology has just been released and ACME is in the process of implementing it. Unfortunately, this is causing utter mayhem among security teams by forcing the company to reconsider the data access controls of every single ACME employee at the same time.
Let’s just say that we don’t exactly envy the security engineers over at ACME right now.
How do we fix this mess?
Businesses across all verticals around the world confront ACME’s data access challenges every day! At Satori Cyber, we believe that a comprehensive solution is long overdue. It’s about time enterprises are supplied with the tools and capabilities they need to help them overcome these obstacles and reach their data strategy’s full potential!
First, let’s outline the real problem—the currently unproductive practices of enterprise data protection & governance. Today, the ‘best practice’ for data access control is to create a well-defined list of accessible data and mete out permissions on that basis. Unfortunately, this simply isn’t feasible once we account for the enterprise data challenges outlined above.
Think about what would happen at a major airport hub like LAX if they scanned their luggage using this approach. If they decided to catalog every permissible item and put the rest on hold until analysis, productivity at the airport would take an unacceptable hit; only a fraction of flights would be able to take off each day! Instead, airport security uses heuristics for items that are definitely bad and employ more advanced algorithms to find additional suspicious items.
We realize that airport security doesn’t inspire a lot of warm feelings among the majority of readers, but in this case, we can actually take a page out of their book. We must appreciate, as they do, that tackling our data access security challenges can’t come at the expense of business output. A good solution must be able to keep us secure without slowing us down. When it comes to data access, this means that a solution blocking “lawful” employee access to data several times every week cannot stand.
To fix this, organizations need to be able to understand where their sensitive data is, especially when it is mixed with other data in data lakes and data warehouses, and track the usage of its consumers. They also need to be able to set limitations over users’ access to sensitive data and regulated data. This is the only way an organization can fully appreciate and detect when suspicious or outright malicious data access behavior is taking place.
Bottom line: enterprises will never be able to maximize their potential if their data scientists continue to be bogged down by outmoded processes. In order to really fix this, organizations need security controls capable of enforcing dynamic and granular data access controls that protect sensitive data, are agnostic to the way in which data is analyzed, and will surface behavior anomalies. This is our guiding ethos at Satori Cyber, and we’ve developed a data governance and protection solution to help companies do just that.
Stay tuned for our upcoming blog post outlining exactly how we plan to do it. In the meantime, we’d love to discuss it with you and invite you to get in touch to learn more! Contact me on Twitter or by E-mail and feel free to read our whitepaper and subscribe to our blog.