Designing a metadata-driven processing framework for Azure Synapse and Azure Purview

Piethein Strengholt
9 min readApr 1, 2022

In the blogpost Modern Data Pipelines, you learned about data pipelines and some of their difficulties. You also read up on an approach by binding Azure Synapse Analytics and Azure Purview in order to build a framework for intelligently processing data.

In this article I want to take the data processing approach one step further by making data pipelines metadata-driven. A metadata-driven framework allows for more scalability. It speeds up development, provides better maintainability, reusability and visibility. For example, you can process thousands of tables and apply a variety of processing steps without designing all data flows by hand.

When turning your data pipeline processing into a metadata-driven approach, you will learn that data lineage won’t show up out of the box because all processing logic sits in a metadata repository. To overcome this problem, I’ll provide a solution by hooking up a data lineage registration component.

Proposed high-level architecture for dynamically processing data (Credits: Piethein Strengholt)

Before you continue reading, it is important to stress out that this blogpost is about demonstrating you concepts. It’s about showing how to get started! The end result isn’t complete. It misses essential features like error handling or advanced transformation techniques. This blogpost will be a long read. At the end you…

--

--

Piethein Strengholt
Piethein Strengholt

Written by Piethein Strengholt

Hands-on Chief Data Officer. Working @Microsoft.

No responses yet