What is Data Classification?
he term data classification refers to processes and tools designed to organize data into categories. The purpose is to make data easier to store, manage, and secure.
Data classification systems support organizations in many efforts, including risk management, compliance, and legal discovery. Additionally, data classification systems can improve the usability and accessibility of data, helping organizations derive more value from their information assets.
Data classification can improve all three fundamental aspects of information security:
- Confidentiality—enabling and application of stronger security measures for sensitive data.
- Integrity—enabling adequate storage provisioning and access controls to prevent data loss, unauthorized modification or corruption.
- Availability—providing controls to make data easily accessible by authorized users.
In this article:
- How Do Compliance Standards Impact Data Classification?
- Data Classification Levels
- Establishing a Data Classification Policy
- 4 Data Classification Best Practices
The information provided in this article and elsewhere on this website is meant purely for educational discussion and contains only general information about legal, commercial and other matters. It is not legal advice and should not be treated as such. Information on this website may not constitute the most up-to-date legal or other information.
The information in this article is provided “as is” without any representations or warranties, express or implied. We make no representations or warranties in relation to the information in this article and all liability with respect to actions taken or not taken based on the contents of this article are hereby expressly disclaimed.
You must not rely on the information in this article as an alternative to legal advice from your attorney or other professional legal services provider. If you have any specific questions about any legal matter you should consult your attorney or other professional legal services provider.
This article may contain links to other third-party websites. Such links are only for the convenience of the reader, user or browser; we do not recommend or endorse the contents of any third-party sites.
How Do Compliance Standards Impact Data Classification?
Many regulations and compliance standards require organizations to perform data classification. Requirements may be different in each compliance standard, depending on the type of data each organization uses, processes, collects, transmits, and stores.
Here are several common compliance standards and their data classification requirements:
- GDPR—entities handling the personal data of European data subjects are required to classify all collected data types. GDPR categorizes specific data related to race, political opinions, healthcare, ethnic origin, and biometrics, as “special”. This data requires additional protection.
- PCI DSS—Requirement 9.6.1 stipulates that entities must “classify data so that sensitivity of the data can be determined.”
- SOC 2—the Trust Services Criteria of SOC 2 requires entities to demonstrate that they regularly identify and maintain confidential information in a manner that meets their unique confidentiality objectives.
- HIPAA—considers personal health information (PHI) as a high-risk asset. The HIPAA Security Rule requires covered entities and relevant business associates (BA) to identify PHI and implement safeguards that ensure its integrity, availability, and confidentiality. The HIPAA Privacy Rule limits the uses and disclosures of PHI, forcing covered entities and business associates to establish data classification procedures.
Data Classification Levels
Data sensitivity levels help determine how each type of classified data should be handled. The Center for Internet Security (CIS), for example, recommends three information classes:
- Business Confidential
The US government has a more extensive classification, with seven levels of data sensitivity:
- Controlled Unclassified Information (CUI)
- Public Trust
- Top Secret
- Code Word Classification
- Restricted Data/Formerly Restricted Data
Using more than three levels can introduce complexities and make data classification hard to control and maintain. Using less than three levels, on the other hand, is considered too simplistic and may lead to insufficient protection and privacy. This is why the majority of organizations use three levels of classification, as advised by the CIS.
Here is a generalized form of the CIS classification definitions which you can use in your data classification efforts:
- Low Sensitivity Data—public information that does not require access restrictions, such as public web pages, blog posts, and job listings.
- Medium Sensitivity Data—intended only for internal use, and can have a major impact on the organization if breached. For example, business plans, customer lists, and non-identifiable personal data.
- High Sensitivity Data—data protected by regulations or compliance standards, requiring strict access controls and protection measures. If breached, the data may cause significant harm to individuals or the organization, and may also result in compliance penalties or fines.
Learn more in our detailed guide to data classification levels
Establishing a Data Classification Policy
A data classification policy defines how your organization manages its information lifecycle. The goal is to ensure sensitive information is handled in a manner relevant to the level of risk it poses. A data classification policy should address access and authorization, taking into account the data structure and its day-to-day business uses.
Here are several key aspects your policy should cover:
- Objectives—the motivation for implementing data classification and the goals to achieve, with measurable key performance indicators (KPIs).
- Workflows—clearly define how the entire classification process should be organized and structured. Explain how this process will impact all employees, and how they should treat different levels of sensitive data.
- Location—identify where the data is stored—on premises, in the cloud, on backup systems, within databases, file systems, etc.
- Schema—determine and describe the categories chosen to classify data.
- Data owners—clearly define all roles and responsibilities of all parties involved in the management of data classification. Describe how each role should classify data and grant access.
- Compliance—clearly define which information is subject to compliance regulations, and what measures to be taken to ensure compliance.
Learn more in our detailed guide to data classification policy (coming soon)
4 Data Classification Best Practices
Here are a few best practices that can help you improve data classification in your organization.
Conduct a Data Risk Assessment
A data risk assessment can help you achieve a comprehensive understanding of all data requirements, including those related to company policies and compliance regulations. You should also determine contractual privacy and confidentiality requirements. Define data classification objectives in coordination with all stakeholders—including IT, security, and legal teams.
Create a Data Inventory
Before you can classify data, you need to locate it using data discovery techniques and tools. Once you have located all sensitive data, you need to identify and classify it to ensure each type of data is appropriately protected.
To make the process efficient and accurate, you can label each sensitive data asset. This can significantly improve your data classification policy enforcement process. You can label data manually or automatically.
Intelligent classification systems can automate this process. For example, a data classification system can use predefined policies to automatically identify and classify data, and then tag it with the appropriate classification label. These systems can continuously monitor data, ensuring that it is always classified properly across the entire data lifecycle.
Establish Data Security Controls
Each data classification level requires a different level of security. To ensure each level is appropriately protected, you should establish standard security measures. Then, define policy-based controls for each classification label.
When defining security measures, you should take into account where each data type resides and the value this data provides to the organization. You can then assess the risks and implement the appropriate controls.
Maintenance and Monitoring
Data is dynamic and requires ongoing monitoring and maintenance. It can be frequently copied, created, modified, deleted, and moved. Since data may undergo many changes throughout its lifecycle, data classification can quickly turn into a time consuming effort.
An important way to reduce data classification efforts is to identify which data really needs to be protected, and focus efforts there. Automated classification systems are another way to reduce workloads and ensure fast detection and treatment of newly created sensitive data. Finally, ensure your data classification policies are flexible enough to deal with changes to data structure, new data types, and growing data volumes.
Learn more in our detailed guide to data classification best practices