Data discovery is the term used to define operations that evaluate existing data stored in a single repository or collected from different sources, to understand trends, patterns, and data types within the data. Data Discovery is usually driven by business intelligence initiatives to uncover insights that the analysts can obtain from the data assets or by security, governance, and privacy to find sensitive data. It might involve using data catalogs and data dictionaries to explore a set of data assets that can be siloed in a single location or scattered through various repositories. This operation also involves the required processing, cleaning, and aggregation of the data from these sources to analyze in search of valuable insight.
Companies and organizations generate and collect vast amounts of data from systems that may include:
- Customer relationship management (CRM) systems
- Business activities
- Transactional systems
- Data warehouses
- Data lakes
This data flows into several repositories to be processed for further analysis and used in consecutive processes. The volume of this data can be vast. It may surpass the processing capabilities of the organization that is collecting it, therefore leading to the creation of dark data that the organization does not use in any significant way. It might become a legal and security burden. Data discovery initiatives may help understand and explore the data to reduce unused data and understand where sensitive data is to prioritize its protection.
In recent years, the advent of artificial intelligence and machine learning has enhanced the data discovery process. These technologies allow discovery in unstructured data, recommendation of data relationships, and acceleration of the process. These recommendations can be later exposed to business stakeholders through analytics and dashboards, automating the data discovery process, and limiting the amounts of Dark Data generated by the company.
The Data Discovery process frequently involves data normalization, handling missing values, and other preprocessing tasks necessary to structure the data and search for meaning patterns that can, later on, be combined and aggregated with different sources.
One of the common use-cases for data discovery is data science, especially in the exploratory stage. These results can, later on, be used to create products or identify business opportunities derived from the insight extracted from these analyses, which are commonly based on advanced analytics methodologies and visual explorations.
Continuous Data Discovery
Satori enables a simplified approach to sensitive data discovery by continuously analyzing data access to create an ongoing data inventory. To learn more, visit our product page.
Data Classification with Satori
Satori provides a different approach to data classification. With Satori, data is continuously discovered and classified, instead of performing ad-hoc scans.
Learn more: