Data Quality

Data quality refers to the state of data assets from the perspective of their intended use. Although there are several definitions of data quality, it is generally considered that a data set has an appropriate quality if it is fit for the intended uses in operations, decision making, and planning. Moreover, to have reasonable quality, it needs to represent the real-world construct it refers correctly.

Consequences of Poor Data Quality

  • Organizations have an over expenditure in marketing campaigns by sending the same material more than once to the same individuals. An example is when there are different user names with the same email address. The problem here is duplicates within the same data assets and across a myriad of sources.
  • Insufficient data due to issues about the completeness of data within internal databases and how data is syndicated between other data systems. These issues eventually lead to incomplete or wrongful information being shown to users. In the case of online sales, insufficient product data cannot support a self-service buying decision and ultimately lead to losing customers.
  • In business intelligence and reporting, you might obtain different answers to the same question due to inconsistent data.

Data governance initiatives can address data quality issues, ultimately enforced through best practices during data handling.

Data Quality Best Practices

  • Ensure management involvement to achieve a cross-departmental view of the data.
  • Enforce data quality initiative as a part of data governance policies. These policies should set the standards and expectations and define the roles required to provide a business glossary and data catalogs.
  • Implement business glossary and data catalogs as the foundation for metadata management. Organizations must leverage metadata management to achieve standard data definitions and link data assets with their business applications.
  • Instrument data quality issue logs, holding information about each issue flagged, assigned data owner and data steward, the impact of the issue, and the resolution that the data engineers undertook. Here it is essential to start with an analysis to determine the origin of the problem to address the root cause.
  • Implement technology that prevents the issues from occurring as close to the data source as possible, rather than relying on downstream data cleansing.
  • Define data quality KPIs that are related to data dimensions which can be, for example, data uniqueness, data completeness, or data consistency.

As the sources of data increase in number and complexity, the question of data quality and consistency becomes significant, regardless of fitness for use for any particular external purpose. It is crucial for companies and organizations that use data for decision-making processes or rely on aggregation from different sources to define clear policies to ensure that data is fit for its use.

 

Satori logo2 white