Dark Data

Most organizations and companies store significant amounts of data every day from multiple sources, data that is later sometimes processed and stored and other times stored in its raw format. The purpose of companies behind this is to keep the data to be used later on to provide meaningful analytics and insights that allow them to optimize operations and decision-making. However, it can be due to compliance that requires companies to keep records of sensitive information and how it is used. One thing to consider is that the limitation of on-premise databases no longer restricts massive amounts of data acquisition. The rise of modern cloud infrastructure enabled the storage of seemingly limitless amounts of data cost-efficiently. 

Examples of such data being stored:

  1. IoT or sensor-based telemetry.
  2. Web or e-commerce user activity.
  3. Business transactions.
  4. Healthcare activities.

In most cases of the data collected, just a tiny part is later used to create valuable insights for decision-making. The rest of the data that is not used is referred to as Dark Data. This term is derived from physics and refers to the unseen matter that constitutes most of the universe. Dark Data can therefore be defined as the portion of the data collected that is left unanalyzed.

Although the modern cloud storage infrastructure provides inexpensive ways to store data, more problems can arise from dark data within an organization. This data might contain information that is restricted or contains personally identifiable information (PII). Such data can be a compliance risk, as well as a security risk.

Organizations can overcome dark data risks by establishing retention data policies. Another helpful control against risks associated with dark data is monitoring user access to data and revoking unused data access permissions. Yet another control is by allowing specific users, groups, or roles to have visibility into data for different periods.

Satori logo2 white