What is Data Classification?
he term data classification refers to processes and tools designed to organize data into categories. The purpose is to make data easier to store, manage, and secure.
Data classification systems support organizations in many efforts, including risk management, compliance, and legal discovery. Additionally, data classification systems can improve the usability and accessibility of data, helping organizations derive more value from their information assets.
Data classification can improve all three fundamental aspects of information security:
- Confidentiality—enabling and application of stronger security measures for sensitive data.
- Integrity—enabling adequate storage provisioning and access controls to prevent data loss, unauthorized modification or corruption.
- Availability—providing controls to make data easily accessible by authorized users.
In this article:
- Why Is Data Classification Important?
- What Are the Four Data Classification Levels?
- What Are the Different Types of Classification of Data?
- Challenges of Data Classification
- How Do Compliance Standards Impact Data Classification?
- Data Classification Levels
- Establishing a Data Classification Policy
- 4 Data Classification Best Practices
The information provided in this article and elsewhere on this website is meant purely for educational discussion and contains only general information about legal, commercial and other matters. It is not legal advice and should not be treated as such. Information on this website may not constitute the most up-to-date legal or other information.
The information in this article is provided “as is” without any representations or warranties, express or implied. We make no representations or warranties in relation to the information in this article and all liability with respect to actions taken or not taken based on the contents of this article are hereby expressly disclaimed.
You must not rely on the information in this article as an alternative to legal advice from your attorney or other professional legal services provider. If you have any specific questions about any legal matter you should consult your attorney or other professional legal services provider.
This article may contain links to other third-party websites. Such links are only for the convenience of the reader, user or browser; we do not recommend or endorse the contents of any third-party sites.
Why Is Data Classification Important?
Data classification provides an interface for organizations to implement controls and procedures across data formats, structures and storage technologies. Classified data allows an organization to define and implement a single policy for handling sensitive data across multiple systems and data objects. Defining multiple policies per each type of data object is not realistic in today’s data abundant environments.
There are several reasons why data classification is important:
- Context: data classification adds business context to applications and processes. For example, based on data classification, an organization can identify applications that handle sensitive data and define stricter security requirements for those applications.
- Compliance: data classification makes it easier to comply, and also proves compliance, with regulatory frameworks such as GDPR, CCPA, HIPAA, and PCI.
- Security: data classification makes the business aware of the data sensitivity, both as a whole and each time data is introduced, and allows the business to use that context to apply the right level of security control.
- Governance: data classification makes it easier to map, track, and control data.
What Are the Four Data Classification Levels?
There are typically four data classification levels in information security:
- Public: data that is in, or can be in, the public domain and can be openly shared with anyone outside of the organization. For example: a data sheet about the company’s products and services.
- Internal: company-wide data that is kept within the organization and, while not sensitive, should not be shared externally. For example: a guide about how to get help from the IT helpdesk.
- Confidential: domain-specific data that can be shared with specific people or teams and contains sensitive company information. For example: a price list for one of the company’s products.
- Restricted: highly sensitive information that should only be available on a need-to-know basis. For example: employee agreements.
What Are the Different Types of Classification of Data?
While data is classified based on each individual business’s needs, there are a few types of data classification that are more common:
- Data-based classification: classification that describes the nature of the data. For example: a credit card number or an email address.
- Context-based classification: classification that describes the data’s business context. For example: sensitive data or earnings data.
- Source-based classification: classification that describes the source of the data. For example: customer data collected from the webinar registration form.
Challenges of Data Classification
While data classification is essential for carrying out various functions, information security is mainly concerned with sensitive data. In most organizations, sensitive data is classified into various sensitivity levels and then mapped to different categories of sensitive data (e.x. personal information).
The challenges organizations usually face when classifying data are:
- False positives: the same data could appear in different formats and different contexts. Classification algorithms that do not take into account the data’s format and context are more likely to generate false classifications. As huge amounts of data are usually involved in classification projects, even very low false positive rates can prevent an organization from effectively classifying.
- False negatives: under various regulatory standards, data might be considered sensitive in a specific context but not in another. For example, a name might be considered non-sensitive by itself but sensitive when alongside a medical record. Classifying data outside of the usage context can and often does result in incorrect classification.
- Big data: data lakes and data warehouses represent ever-growing, dynamic repositories of data, creating a huge challenge for non-continuous classification tools.
- Cost: for most classification tools, the cost of implementing and operating a data classification policy depends on the amount of data and the number of controls established. This process hinders an organization that wants to classify large data sets with strict access requirements.
How Do Compliance Standards Impact Data Classification?
Many regulations and compliance standards require organizations to perform data classification. Requirements may be different in each compliance standard, depending on the type of data each organization uses, processes, collects, transmits, and stores.
Here are several common compliance standards and their data classification requirements:
- GDPR—entities handling the personal data of European data subjects are required to classify all collected data types. GDPR categorizes specific data related to race, political opinions, healthcare, ethnic origin, and biometrics, as “special”. This data requires additional protection.
- PCI DSS—Requirement 9.6.1 stipulates that entities must “classify data so that sensitivity of the data can be determined.”
- SOC 2—the Trust Services Criteria of SOC 2 requires entities to demonstrate that they regularly identify and maintain confidential information in a manner that meets their unique confidentiality objectives.
- HIPAA—considers personal health information (PHI) as a high-risk asset. The HIPAA Security Rule requires covered entities and relevant business associates (BA) to identify PHI and implement safeguards that ensure its integrity, availability, and confidentiality. The HIPAA Privacy Rule limits the uses and disclosures of PHI, forcing covered entities and business associates to establish data classification procedures.
Data Classification Levels
Data sensitivity levels help determine how each type of classified data should be handled. The Center for Internet Security (CIS), for example, recommends three information classes:
- Business Confidential
The US government has a more extensive classification, with seven levels of data sensitivity:
- Controlled Unclassified Information (CUI)
- Public Trust
- Top Secret
- Code Word Classification
- Restricted Data/Formerly Restricted Data
Using more than three levels can introduce complexities and make data classification hard to control and maintain. Using less than three levels, on the other hand, is considered too simplistic and may lead to insufficient protection and privacy. This is why the majority of organizations use three levels of classification, as advised by the CIS.
Here is a generalized form of the CIS classification definitions which you can use in your data classification efforts:
- Low Sensitivity Data—public information that does not require access restrictions, such as public web pages, blog posts, and job listings.
- Medium Sensitivity Data—intended only for internal use, and can have a major impact on the organization if breached. For example, business plans, customer lists, and non-identifiable personal data.
- High Sensitivity Data—data protected by regulations or compliance standards, requiring strict access controls and protection measures. If breached, the data may cause significant harm to individuals or the organization, and may also result in compliance penalties or fines.
Learn more in our detailed guide to data classification levels
Establishing a Data Classification Policy
A data classification policy defines how your organization manages its information lifecycle. The goal is to ensure sensitive information is handled in a manner relevant to the level of risk it poses. A data classification policy should address access and authorization, taking into account the data structure and its day-to-day business uses.
Here are several key aspects your policy should cover:
- Objectives—the motivation for implementing data classification and the goals to achieve, with measurable key performance indicators (KPIs).
- Workflows—clearly define how the entire classification process should be organized and structured. Explain how this process will impact all employees, and how they should treat different levels of sensitive data.
- Location—identify where the data is stored—on premises, in the cloud, on backup systems, within databases, file systems, etc.
- Schema—determine and describe the categories chosen to classify data.
- Data owners—clearly define all roles and responsibilities of all parties involved in the management of data classification. Describe how each role should classify data and grant access.
- Compliance—clearly define which information is subject to compliance regulations, and what measures to be taken to ensure compliance.
Learn more in our detailed guide to data classification policy (coming soon)
4 Data Classification Best Practices
Here are a few best practices that can help you improve data classification in your organization.
Conduct a Data Risk Assessment
A data risk assessment can help you achieve a comprehensive understanding of all data requirements, including those related to company policies and compliance regulations. You should also determine contractual privacy and confidentiality requirements. Define data classification objectives in coordination with all stakeholders—including IT, security, and legal teams.
Create a Data Inventory
Before you can classify data, you need to locate it using data discovery techniques and tools. Once you have located all sensitive data, you need to identify and classify it to ensure each type of data is appropriately protected.
To make the process efficient and accurate, you can label each sensitive data asset. This can significantly improve your data classification policy enforcement process. You can label data manually or automatically.
Intelligent classification systems can automate this process. For example, a data classification system can use predefined policies to automatically identify and classify data, and then tag it with the appropriate classification label. These systems can continuously monitor data, ensuring that it is always classified properly across the entire data lifecycle.
Establish Data Security Controls
Each data classification level requires a different level of security. To ensure each level is appropriately protected, you should establish standard security measures. Then, define policy-based controls for each classification label.
When defining security measures, you should take into account where each data type resides and the value this data provides to the organization. You can then assess the risks and implement the appropriate controls.
Maintenance and Monitoring
Data is dynamic and requires ongoing monitoring and maintenance. It can be frequently copied, created, modified, deleted, and moved. Since data may undergo many changes throughout its lifecycle, data classification can quickly turn into a time consuming effort.
An important way to reduce data classification efforts is to identify which data really needs to be protected, and focus efforts there. Automated classification systems are another way to reduce workloads and ensure fast detection and treatment of newly created sensitive data. Finally, ensure your data classification policies are flexible enough to deal with changes to data structure, new data types, and growing data volumes.
Learn more in our detailed guide to data classification best practices