What is the Meaning of Tokenization?
Tokenization is a form of data masking, which replaces sensitive data with a different value, called a token. The token has no value, and there should be no way to trace back from the token to the original data. When data is tokenized, the original, sensitive data is still stored securely at a centralized location, and must be protected.
When applied, tokenization approaches can vary according to the security applied on the original data values. The approach should also account for the process and algorithms used to create the token as well as establish a mapping system between tokens and original data values.
In this article:
Why is Tokenization Important?
A tokenization platform helps remove sensitive data, such as payment or personal information, from a business system. Each dataset is replaced with an undecipherable token. The original data is then stored in a secure cloud environment—separated from the business systems.
When applied in banking, tokenization helps protect cardholder data. When a business processes a payment using a token, only the tokenization system can swap the token with a corresponding primary account number (PAN). The tokenization system then sends it to the payment processor for authorization. This ensures that business systems never store, transmit, or record the PAN—only the generated token.
Cloud tokenization platforms can help prevent the exposure of sensitive data. This can prevent attackers from capturing usable information. However, tokenization is not intended to stop threat actors from penetrating networks and information systems. Rather, a tokenization system serves as a security layer designed especially to protect sensitive data.
How Data Tokenization Works
Tokenization processes replace sensitive data with a token. The token itself has no use and it is not connected to a certain account or individual.
Tokenization involves replacing the 16-digit PAN of the customer with a custom, randomly-created alphanumeric ID. Next, the process removes any connection between the transaction and the sensitive data. This limits exposure to breaches, which is why tokenization is highly useful in credit card processing.
Tokenization processes can safeguard credit card numbers as well as bank account numbers in a virtual vault. Organizations can then safely transmit data across wireless networks. However, to ensure the tokenization process is effective, organizations need to use a payment gateway for secure storage of sensitive data. A payment gateway securely stores credit card numbers and generates random tokens.
Tokenization and PCI DSS
Payment card industry (PCI) standards restrict merchants from storing credit card numbers on their POS terminal or in their databases after a transaction. PCI compliance requires that merchants install encryption systems.
Alternatively, merchants can outsource their payment processing to a service provider offering tokenization. In this case, the service provider issues tokens and is held responsible for keeping cardholder data secure.
Tokenization vs Encryption
Tokenization and encryption are data obfuscation techniques that help secure information in transit and at rest. Both measures can help organizations satisfy their data security policies as well as regulatory requirements, including PCI DSS, GLBA, HIPAA-HITECH, GDPR, and ITAR.
While tokenization and encryption share similarities, these are two different technologies. In some cases, a business may be required to apply only encryption, while other cases might require the implementation of both technologies.
Here are key characteristics of encryption:
- Mathematically transforms plain text—encryption processes use math to transform plain text into cipher text. This is achieved by using an encryption algorithm and key.
- Scales to large data volumes—a small encryption key enables you to decrypt data.
- Structured fields and unstructured data—you can apply encryption to both structured and unstructured data, including entire files.
- Original data leaves the organization—encryption enables organizations to transmit data outside the organization in an encrypted form.
Encryption is ideal for exchanging sensitive information with those who have an encryption key. However, format-preserving encryption schemes offer lower strength.
Here are key characteristics of tokenization:
- Randomly generates a token value—tokenization systems generate random token values, which then replace plain text. The mapping is stored in a database.
- Difficult to scale securely—when databases increase in size, it becomes difficult to securely scale and maintain performance.
- Structured data fields—tokenization applies to structured data, such as Social Security numbers or payment card information.
- Original data does not leave the organization—tokenization helps satisfy compliance demands that require keeping the original data.
Tokenization enables organizations to maintain formats without diminishing the security strength. However, exchanging data can be difficult because it requires direct access to a token vault that maps token values.
Related content: Read our guide to data encryption
4 Data Tokenization Best Practices
Here are some practices to help you make the most of tokenization.
Secure Your Token Server
To ensure your tokenization system is compliant with PCI standards, it is crucial to secure your token server by maintaining network segregation. If the server is not adequately protected, the entire system could be rendered ineffective. The token server is responsible for reversing the tokenization process, so it must be safeguarded with robust encryption.
Combine Tokenization with Encryption
Encryption service providers are increasingly turning to tokenization as a means of complementing their encryption capabilities. While some experts favor the use of either tokenization or end-to-end encryption, the best approach may be to combine the two. This is particularly relevant for payment card processing.
Each technology offers different functions, which achieve different purposes. Tokenization works well with database infrastructure and provides irreversible data masking. End-to-end encryption offers greater protection for payment card data in transit.
Generate Tokens Randomly
To maintain the irreversibility of token values, they must be generated randomly. Applying mathematical functions to the input in order to generate the output allows it to be reversed to reveal the original data. Effective tokens can only be used to uncover PAN data through a reverse lookup in the database of the token server.
Generating random tokens is a simple process, because the data type and size constraints are insignificant. PAN data shouldn’t be retrievable from tokens, so randomization should be applied by default.
Don’t Use a Homegrown System
Tokenization may appear to be simple in theory, but it is still possible to make mistakes. Tokens must be generated and managed properly, with the token server being secured in a PCI-compliant way. All this can be complicated to manage entirely in-house.
Homegrown tokenization deployments carry a greater risk and often fail to meet compliance standards. Tokens may be easily deciphered if they are reversible, or if the overall tokenization system isn’t properly secured.
Data Tokenization with Satori
With Satori you can de-tokenize data, without having the de-tokenized data pass through your data store. For example, if you have sensitive tokenized data in Snowflake, you can de-tokenize it by using Satori, without the clear-text sensitive data passing through your Snowflake account. To learn more, contact us.