How is the Snowflake Data Warehouse Affected By GDPR?
Snowflake is a cloud-based data warehouse. Because data warehouses are often used to store personally identifiable information (PII), their use is directly impacted by privacy regulations like the EU General Data Protection Regulation (GDPR).
While many of the activities that drive an organization’s GDPR compliance are your organization’s responsibility, software vendors like Snowflake can help you meet those responsibilities. According to its makers, Snowflake was designed to reduce the GDPR compliance burden, especially for customer-centric organizations.
Snowflake stores large volumes of structured and semi-structured data, enabling access to the data via standard SQL. The accessibility and simplicity of SQL makes it easy to perform the updates, changes, and deletions required by the GDPR. Snowflake also supports semi-structured data, so you can easily adapt to changes in new fields and other records. Finally, Snowflake’s advanced security capabilities can help meet GDPR provisions for protecting customer data.
In this article, you will learn:
- How Does Snowflake Help Secure Personal Data?
- Time Travel
- Other Security Features
- Best Practices for GDPR Compliance With Snowflake
- Build a Data Model that Segregates PII Data
- Conduct Batch Deletions, Match Time Travel to the Deletion Window
- Implement Tracking
- Use an Access Control Service
- Snowflake GDPR Compliance with Satori
Which Snowflake Features Can Help Comply with GDPR?Let’s review several data security features provided by Snowflake, and how they can help your organization comply with GDPR requirements.
Data MaskingThe GDPR regulation requires organizations to manage access to sensitive data. Personally identifiable information (PII) must be protected from unauthorized modification and disclosure. Snowflake provides a data masking feature that lets you assign role-based access control (RBAC) dynamically, enabling granular control over which employees can view sensitive data. Snowflake’s dynamic data masking allows designated administrators to create and apply masking policies at the column level. These policies can restrict access to data in the column of a table or view. Approved roles view the column values as-is, while other roles see masked values. You can also define specific users who should have access to the data, or should view obfuscated data.
Access ControlBeyond data masking, role-based access is necessary to demonstrate protection of datasets containing PII. Snowflake’s access control feature lets you define specific roles that have the ability to access specific objects, and define permissions specifying what actions they can perform. By default, Snowflake provides four roles – an ACCOUNTADMIN, the highest level of privilege, a SECURITYADMIN with permission to define users, roles and privileges, a SYSADMIN with permission to access entire databases, schemas, and tables, and a PUBLIC role assigned to all other users. To address GDPR requirements, you should create custom roles for different business functions. Use the principle of least privilege to ensure each user has access to only the objects and privileges essential to their day-to-day role.
Other Security FeaturesSnowflake provides the following additional security features:
- Encryption—when using PUT and COPY commands, data in transit and at rest is encrypted by default
- Multi-Factor Authentication (MFA)—user access is performed via MFA by default, with single sign-on (SSO) capabilities
- Secure data architecture—Snowflake makes it easy to isolate and protect sensitive data, anonymize or mask data, manage the data lifecycle and delete it at the end of the data retention period
Snowflake Continuous Data Protection: Complying with GDPR Article 17
GDPR Article 17 specifies that an individual can request deletion of their personal information, and an organization has a deadline of 30-90 days to delete it. Snowflake provides two features, which, while useful, can conflict with this requirement:
- Time Travel – this feature makes it possible to recover data in the data warehouse to any time within the last 24 hours. Time Travel makes it possible to roll back individual tables, or the entire database. Snowlake Enterprise Edition enables Time Travel for up to 90 days.
- Fail-Safe – if you accidentally delete a table or database and the Time Travel period is over, Snowflake provides a secondary protective measure called Fail Safe, which lets you retrieve your data up to 7 times from the time of data loss, by contacting the Snowflake support team.
These features raise the question, in case an individual requests removal of their data, how can you make sure the data is completely removed from Snowflake and cannot be recovered? The following best practices can help you deal with this problem and ensure that Snowflake does not continue to store data that you are legally required to delete.
Build a Data Model that Segregates PII Data
Build a data model that separates PII data into a distinct table or data set. Create an inventory to identify and account for all available PII data types. This is a key practice for privacy regulations because it precisely identifies data that requires protection or deletion.
The separation strategy can resolve several challenges of GDPR data management:
- Risk of peripheral data loss – interspersing PII data in a large table poses the risk of deleting other important analytics or business data
- When deleting PII data for a single individual, you may delete a row that inadvertently deletes columns of non-GDPR-related data.
- Deleting information from a larger table may incur costly updates
Conduct Batch Deletions, Match Time Travel to the Deletion Window
The Health Insurance Portability and Accountability Act (HIPAA) sets out best practices for data deletion that are adaptable to data management processes. For example, rather than deleting PII data per request, adding a GDPR flag and date enables monthly batch process to ensure deletion within the 30-day GDPR window.
Snowflake’s Time Travel can represent a compliance risk in this context. GDPR enables a 30-day window (up to 90 in special cases) to delete PII data. With Snowflake’s enterprise version, you can set Time Travel to less than 30 days for PII-specific tables, to prevent inadvertent restoration of deleted data.
In case data is removed for the wrong person, point-of-time restore for those specific records is possible, as long as you are still within the Time Travel window. This enables error recovery within the GDPR framework.
Track PII erasure requests and deletions in a separate table. This can prevent rollback issues—an important Time-Travel-related concern. Thus, for example, restoring to before the execution of a batch deletion gives you the option to delete the PII data again, by querying the deletions table.
Fail-Safe also ensures any lost or deleted data is restorable. As above, you can use the deletion table to restore individual PII data. Fail-Safe operates within a seven-day period, placing you within the GDPR’s 90-day window, so long as you perform monthly batch deletions.
Snowflake GDPR Compliance with Satori
Satori is a data warehouse data access solution, which can help organizations comply with GDPR in several ways:
- Automatically classifying PII data, building a data inventory of sensitive data.
- Providing an interface for exporting data access audit reports for Snowflake.
- Letting you set data access controls and anonymization policies to regulated data, so unauthorised people can’t access datasets that include PII.