Pseudonymisation: 9 Ways to Protect Your PII

What Is Pseudonymisation?

Pseudonymisation is a way of masking data that ensures it is not possible to attribute personal data to a specific person, without using additional information subject to security measures. It is an integral part of the EU General Data Protection Regulation (GDPR), which has several recitals specifying how and when data should be pseudonymized.

The term personal data applies to information related to an individual known as a data subject. Data subjects are identifiable based on attributes such as a person’s name, ID number, or location, or specific identity factors such as the physical, genetic, physiological, mental, cultural, economic, or social characteristics of the individual.

In this article:

Which GDPR Recitals Mention Pseudonymization?
5 Pseudonymization Techniques
4 Pseudonymization Policies
Pseudonymisation with Satori

Which GDPR Recitals Mention Pseudonymization?

The GDPR requires the implementation of pseudonymisation for the purpose of protecting personally identifiable information (PII). The GDPR includes data protection principles that apply to identified or identifiable individuals.

The GDPR treats personal data that is attributable to a data subject as PII, even if it has undergone pseudonymization and requires additional information to crack. Organizations must consider all reasonable means to determine whether pseudonymization can be traced to data subjects using additional information.

Note that data protection regulations don’t apply to anonymous data, because it cannot be traced to a data subject.

Here are several GDPR recitals related to pseudonymization:

GDPR recital 28

This recital recommends using pseudonymization to reduce any risks threatening data subjects as well as help data processors and data controllers meet data protection duties and achieve compliance. However, pseudonymization is not a substitute for other data security and protection measures.

GDPR recital 29

This recital offers incentives to encourage data controllers to apply pseudonymization. Additionally, GDPR recital 75 refers to unauthorized reversal of pseudonymization as a risk and violation of an individual’s freedoms.

GDPR recital 78

This recital recognizes pseudonymization of PII as a means of demonstrating GDPR compliance. This is similar to demonstrating compliance by adhering to data protection standards and other codes of conduct that may impact assessments.

GDPR recital 85

This recital refers to unauthorized reversals of pseudonymization as a personal data breach that can trigger a notification duty, which reaches the controller. Additionally, GDPR recital 156 states that pseudonymization of data is a safeguard that controllers can use to determine whether it is feasible to process any further personal data. This is done for archiving purposes, as well as for historical, statistical or scientific research purposes, when the identification of data subjects is not permitted.

5 Pseudonymization Techniques

Here are several technical methods you can use to pseudonymize sensitive data.

Data Scrambling

This technique involves mixing and obfuscating letters. For example, the name Jonathan, can be scrambled into ‘Tojnahna’.

Data Masking

Data masking involves hiding important or unique parts of the information through the use of random characters or other data. Data masking can help identify data without having to manipulate actual identities. For example, the credit card number “5600-0000-0000-0003” can be stored as “XXXX-XXXX-XXXX-0003”.

Read more about data masking in our dedicated data masking guide.

Data Encryption

Encryption involves rendering original data into an unintelligible form. Ideally, this process cannot be reversed without using the correct decryption key. The GDPR requires keeping additional information, including the decryption key, separately from pseudonymized data.

Learn more in our detailed guide to data encryption

Data Tokenization

Tokenization processes replace sensitive information with a random token value, which is used to access the original information. Tokens have no connection to the original information and can be used on a one-time basis to increase their level of security. Tokens also enable organizations to minimize their access to sensitive information and any related liability.

Learn more in our detailed guide to data tokenization

Data Blurring

This technique involves using an approximation of values to obscure the original meaning of the data. It can also make it impossible to identify these individuals. For example, a blurred face in an image.

4 Pseudonymization Policies

Pseudonymization policies are different approaches to substituting real data with other data. Each policy may have implications on the ease of implementation and the rigorousness of data protection.

Deterministic Pseudonymizatio

This policy requires replacing the original information with an identical substitution across all databases and whenever it appears. This ensures the substitution is consistent within the database and between multiple databases. When implementing this policy, you need to first extract the list of unique identifiers from the database. Next, map the list to the substitutions. Finally substitute the original information in the database.

Randomized Pseudonymization

This policy replaces any occurrences of the original information within the database with fully-randomized substitutions. The policy can serve as an extension of document-randomized pseudonymization – although the two policies behave similarly when applied on one document, if fully-randomized pseudonymization is applied multiple times to the same document, it will produce different outputs.

Document-randomized pseudonymization, on the other hand, results in the same output being reproduced. This means that document-randomized pseudonymization applies selective randomness, while fully-randomized pseudonymization applies randomness globally to any record.

Document-Randomized Pseudonymization

This policy replaces the original information with a different value every time it appears in the database. However, the original information is always mapped to the same set of substitutions in the dataset.

In this case, the substitution is consistent only between different databases. The mapping table is created using all identifiers stored in the database, and each occurrence of an identifier is treated independently.

Establishing Your Pseudonymization Techniques and Policies

Here are several different parameters that can help you choose a pseudonymisation technique and policy:

The data protection level—RNG, encryption, and message authentication codes are generally considered stronger techniques. However, pseudonymization can offer additional protection. Fully-randomized pseudonymisation policies offer the highest protection level. However, they prevent comparisons between databases.
The utility of the pseudonymized dataset—utility requirements might entail using a combination of several approaches and variations of a chosen approach. Document-randomized and deterministic functions offer utility. However, they enable records to be linked.

Pseudonymisation with Satori

To learn more about how Satori can help you protect access to sensitive data, book a demo with one of our experts.