Data Quality within Lakehouses

A deep dive into data quality using bronze, silver, and gold layered architectures

Piethein Strengholt
9 min readMar 5, 2024

Introduction

Many times I am asked, how do we ensure data quality within Data Lakehouses? Or how to manage data quality for my data products? In this article, we will see if there is a need to validate, first of all. If yes, what data must be validated and how to validate the data, as well. These are items that many enterprises, at the start of the Data Lakehouse journeys, are dealing with.

Why data quality?

Data quality matters. Failure to validate data quality can lead to numerous effects, both operational and strategic, for any organization.

On a strategic level, poor data quality can result in incorrect insights, causing wrong decisions and strategies. These inaccuracies can be potentially leading to loss of revenue, customer dissatisfaction, and damage to the organization’s reputation. In several highly regulated areas, extreme bad data quality could lead to legal and financial consequences. For example, for a bank it could mean losing the license to operate.

On an operational level, poor data quality can cause inefficiencies, as resources may be wasted in attempting to rectify errors or reconcile inconsistent data. Additionally, poor data quality can result in long latencies, potentially upsetting stakeholders.

--

--