For additional information, visit our specific guides:
Data Inventories are the complete records of the data assets collected and stored by a given organization or company. It is a term closely related to data catalogs and data dictionaries. It is used to gain insight and understand the amount and content of data available. This information is valuable in data exploration processes that seek to undercover business opportunities and insights from the data. Data inventories are also used to comply with data protection regulations such as GDPR (General Data Protection Regulation). Additional use is to apply general practices of data governance and implement security measures such as access management and encryption, according to the level of sensitivity of the data.
The need to have appropriately in place data inventories started to increase with the rise in complexity in current data pipelines and data platforms that require storing data in repositories of different nature such as data warehouses and data lakes. The data assets tracked in data inventories can be databases with associated structure, metadata, raw files, etc. These data assets are more scattered throughout various locations and resources, creating complications when an organization needs to find and query data assets.
Data Inventories hold information about names of data assets, nature of the contents, metadata on update frequency, intended use, owner, security and privacy standards apply, source, and other relevant metadata that might be necessary to operate with the asset. Ideally, data inventories should hold as much information about all data assets within an organization. However, it can be beneficial to narrow down data assets according to a predefined standard of importance and set clear policies stating how these data inventories are being kept up to date to account for changes in data assets.
Data Inventories are valuable tools for applying analytics to organizational data and reducing the risk of data breaches and exposures. Data teams can use it to create checklists to validate the security compliance of different data assets and to be able to be accountable for the quality of the data which is served to the various teams within an organization. It can also be used to design efficient reporting tools, support decision-making in the planning or design of data pipelines, and be used for overall performance optimization in data products. Data engineers can also flag existing data and missing data that needs to be collected or produced. Data inventories create a clear picture that organizations can use to develop roadmaps towards specific desired features or outcomes.
Data Classification with Satori
Satori provides a different approach to data classification. With Satori, data is continuously discovered and classified, instead of performing ad-hoc scans.
Learn more: