Modern Data Pipelines with Azure Synapse Analytics and Azure Purview

Piethein Strengholt
9 min readDec 30, 2021

Processing data within modern data platforms, for example data lakehouses, is a complicated task. The variety and multitude of sources quickly make data pipelines chaotic and difficult to handle. A solution for this problem is to follow the mantra of ‘code once use often’ by making your pipelines dynamic and configurable.

With this blogpost I want to demonstrate how Azure Synapse Analytics and Azure Purview can be integrated together to build a framework for intelligently processing data. Azure Synapse is positioned as the analytics platform and Azure Purview as the foundation for metadata management. This blogpost will be a long read, but at the end you will see how Spark is used for data processing, data quality validation and building up historical datasets (slowly changing dimensions). Purview in this respect will hold all metadata for dynamically driving these data pipelines.

Prerequisites

To get started, you must have an Azure Purview account and the following services configured:

Configuration — Create Pipeline using Copy Data Tool

--

--