What Is Data Classification?
Data classification involves the organization of structured and unstructured data into logical categories. The goal is to ensure data is used in a more secure and efficient manner. Data classification enables organizations to easily locate and retrieve their data. It also facilitates better risk management, regulatory compliance and legal discovery.
Data classification processes apply labels to personal information and sensitive data. Data classification labels ensure that data can be effectively and accurately searched and tracked. Another key advantage of data classification is that these processes eliminate duplicate data, reduce storage and backup costs, and help minimize cyber security risks.
In this article:
3 Data Classification Criteria
Data classification involves assigning metadata to pieces of information according to certain parameters. Here are three common criteria used for data classification:
- Content-based classification—assigns tags based on the contents of certain pieces of data. This scheme reviews the information stored in a database, document or other sources, and then applies labels that define the data type and a sensitivity level.
- Context-based classification—uses environmental information, like metadata, to create data classification labels. For example, this method may automatically classify all documents produced by a specific application or user as financial information. Additionally, you can use context-based classification to generate labels based on predefined rules that define data type and the sensitivity level.
- User-based classification—a knowledgeable user decides how a certain classification label should be applied to a specific piece of data. This user can be a specialized classification authority or the creator of the data. However, this method may cause scalability issues in organizations that generate large amounts of data.
Data Classification Levels
Here are several types of data sensitivity levels:
Data Sensitivity Levels Used by Businesses
- Restricted—restrict the use and access of all data classified as highly sensitive. This type of level is often handled on a “need-to-know” basis. Restricted data may include intellectual property, personally identifiable information (PII), trade secrets, health information and cardholder data. Disclosure of this data can have significant financial or legal implications.
- Confidential—this data can be used across the organization. However, it must be contained within the boundaries of business. Confidential data is usually subject to legal restrictions that regulate how the data must be handled. Confidential data may include pricing, contracts and marketing plans. Disclosure of this data can negatively affect operations and brand.
- Internal—this type of information is made available company-wide but it is still considered internal data that requires protection, albeit limited. Internal data may include company directories, company-wide memos, and employee handbooks. Disclosure of this type of data may result in minimal impact on the organization .
- Public—you can share the information openly with the public. This type of data does not require any security controls when used or stored.
Data Sensitivity Levels Used in Government
- Top Secret—information that requires the highest level of access control and protection. It is restricted to people with a “need to know” clearance. Disclosed top-secret data can threaten national security.
- Secret—information that requires a high level of protection. The disclosure of this information can cause serious damage to national security.
- Confidential—applies to the lowest level of classified government data. Confidential data requires less protection than top-secret or secret data. Disclosed confidential information can cause some harm to national security.
- Sensitive but unclassified (SBU)—includes all information that is not otherwise classified. However, it is still categorized as sensitive, which means it requires some protection. Disclosed SBU data may violate the privacy rights of citizens.
- Unclassified—applies to data labeled as not sensitive. This data does not require any protection.
Learn more in our detailed guide to data classification levels
Common Data Classification Methods
The following are several ways of addressing data classification using an organization-wide data classification policy.
Related content: Read our guide to data classification policies
Paper-Based Classification Policy
This policy outlines how employees need to treat various sorts of data they deal with, in keeping with the organization’s overall approach to data security and strategy. A well-defined policy will let users make intuitive and speedy decisions regarding the worth of a bit of information, and which handling rules apply. For instance, who might access the information and should you use a rights management template. The difficulty, without backing technology, is making sure that all parties have knowledge of the policy and put it in place correctly.
Automated Classification Policy
This technique does not involve the user. It enforces a classification policy, making sure it is consistently applied over all touchpoints, without major education programmes or communication.
Classifications are put in place by solutions which rely on software algorithms, which use phrases or keywords from the content to classify and analyze it. This method is very effective where particular sorts of data are developed without user involvement – such as reports developed by ERP systems, or where the information includes particular personal information that can be quickly identified, for example credit card data.
Yet, automated solutions cannot interpret context and are thus open to inaccuracies, and providing false positives that can annoy users and hinder business processes. They might also give false negatives that expose organizations to sensitive information loss.
User-Driven Classification Policy
The data classification process could be entirely automated, yet it is more efficient if the user has control.
This approach makes employees responsible for choosing the appropriate label, and attaching it via a software tool at the point of editing, creating, saving or sending. The benefit of including the user in this exercise is that their understanding of the context, sensitivity of a bit of information and business value lets them arrive at an accurate and informed decision regarding which label to use. User-driven classification is an added layer of security often combined with automated classification.
Involving users in classification has other organizational advantages, including better security awareness and enhanced capacity to monitor user behavior. This makes it easier to report issues and demonstrate compliance. What’s more, managers can make use of this behavioral data to isolate potential insider threats. They can attend to any issues by offering more guidance to users where fitting (for instance, via additional training or fine tuning policy).
Automated Data Classification with Satori
Satori provides a different approach to data classification. With Satori, data is continuously discovered and classified, instead of performing ad-hoc scans.
Learn more: