Data-driven strategies have completely transformed the way we operate and enabled incredible growth for those that know how to leverage it properly. However, this growth inevitably leads to concerns about the unmanaged risk that huge data repositories and ungoverned data and access present. Over the course of the last few years, I’ve noticed how enterprise leadership, especially among regulated industries like FinTech and HealthTech, have begun to pay more and more attention to growing security and compliance challenges in data lakes and data warehouses.
In this blog, I want to highlight the most common challenges cyber leaders have shared with me in their efforts to tackle data protection and data governance within their vast data repositories. Yoav (Satori’s CTO and Co-Founder) and I have spoken with dozens of security executives to date, across multiple industry verticals, to understand how Satori Cyber can fit their needs. Here’s what they had to say:
Data lakes and data warehouses can host unimaginable volumes of data. Because their pricing, in many cases, is based on access, not storage, they’ve become virtual dumping grounds of sorts for any and all business-generated data. What this means is that data protection and data governance in these platforms must be carried out at enormous scales. Everything, from discovery and classification, to access controls and security, to policies and auditing, needs to be re-invented to contend with the sheer volume of data stored and the velocity of its generation and usage.
Lack of Visibility
Today, many organizations are trying to stay on top of their data protection and data governance challenges through various ad-hoc means of tracking data access and usage. However, while access logs do exist, actually making sense of them is very challenging. As differing data technologies have created their own respective log systems with varying levels of information. In addition, the crucial pieces of context required to make sense of these logs—information about the user accessing the data, information about the nature of data being accessed and information about what is considered normal and in line with the organization’s policy—are usually spread across different tools that aren’t correlated with the logs.
In many organizations, data and analytics teams are granted very broad permissions to access data. This makes sense, as it allows organizations to make the most of their data to move forward. However, while broad access is indispensable to enterprise innovation and success, the lack of compensating controls makes it very hard to reduce the risk of data privacy compromise. The flipside of this approach, meaning restrictive data access, all but defeats the purpose of data-driven strategies and cannot be considered an appropriate alternative.
Mapping Business Requirements to Technical Implementation
There’s a significant gap between what enterprises need to move forward and how their requirements are implemented. While a business might define a policy in terms of groups of users, as well as permitted data-types and allowed usage, the actual implementation of those policies is defined in terms of service users, tables, objects, rows and columns. If even possible, breaking down such business requirements to all technical aspects is an enormous endeavor to carry out at scale. Aspects such as where data is stored, what tools can be used to access that data, where controls can be implemented and how those could be monitored and audited can take months to define and implement.
Data Access Attribution
Data access and usage are driven by different tools, including home-grown applications, BI tools, command-line interfaces and scripts. For many of them, the user connecting to the data store is the user configured for the tool itself and not the actual employee behind the tool. In most cases, a dedicated user, sometimes referred to as a service user, must be created in order to grant and manage data access for these tools. While convenient from a user management perspective, this means that data access cannot be attributed to the real user driving it.
Understanding the issues at hand is a necessary first step to determining their appropriate solutions. With that said, it was important to us that our consulting executives had a say in how their data concerns would be addressed. Throughout our discussions, we asked our fellow experts to share what attributes would make up a great solution. We uncovered a set of design principles that are essential for any solution addressing the data protection and data governance challenges outlined above.
Any solution designed to tackle data protection and governance must, at its core, fundamentally appreciate that data drives innovation for data-driven organizations. The goal, above all else, must be to optimize their data operations with easy, efficient and broad data access. By this logic, a good data protection and data governance solution must enable, rather than disrupt or negatively impact, normal data access. However, a great solution will make data access easier, more efficient and even broader than before!
User, Data and Intent Context
Data repositories are universes unto themselves. Without a dedicated platform, enterprises can’t possibly hope to keep track of what’s happening within them. Organizations want to answer basic questions such as who is accessing their data, what kind of data is being accessed and what that data is being used for. The best, and only, way to achieve this is by contextualizing data access with information about user identity, data type and access intent.
Context alone cannot help enterprises make sense of their data repositories. In addition to issues of scale and access, organizations simply cannot track where their data is. Data is a moving target, and protecting and governing it at scale is only possible with continuous visibility and insights that can be used to enhance controls and reduce risk over time.
Business Level Policies
While broad data access is key to business innovation, there are some types of data where there truly is no need to offer cross-organizational access. A common use case we’ve come across many times is a requirement to restrict access to PII data for specific teams. An implementation of such a policy is expected to adjust to changes in team composition, the definition of PII data and its actual storage location.
In addition to enjoying improved security, privacy and compliance, stakeholders wish to expand data governance for their data operations. Some of the most heavily requested features to this effect were tools to help identify stale data, optimize access patterns for performance and obtain insights for data usage.
We’ll continue to carry out these discussions with top-level security executives and always work to improve Satori’s approach. We’ll be sure to share more insights from these discussions down the line. In the meantime, we invite you to read more about how Satori’s Secure Data Access Cloud already provides complete data-flow mapping with transparent, secure and compliant data access across all cloud and hybrid data stores.