Process events from Azure Purview’s Atlas Kafka topics via Event Hubs and NodeJS

Piethein Strengholt
3 min readJan 6, 2022

Azure Purview is a unified data governance solution that runs on Azure. Some people say it is a data catalog, but I mostly encourage my clients to see it a control framework for managing and controlling your data landscape. For example, you can use Azure Purview as a metastore for dynamically orchestrating your data pipelines.

When Azure Purview is deployed, a managed event hub is created as part of your Purview account creation. This opens up many possibilities when integrating Azure Purview with other applications. Because of the open nature of Azure Purview, you can automate and integration different aspects. For example, you can trigger a workflow outside Azure Purview, when new data is scanned, or make an API call for fetching and storing additional metadata inside Purview. In this tutorial I want to show you how this works.

For this demo you need the following prerequisites:

For reading and publishing messages to Purview we will use the Event Hubs endpoint, which is compatible with the Apache Kafka. First we need to get the namespace associated with your Purview Account. You can lookup the endpoint configuration from the Properties pane in the Azure Portal. This connection string includes both the workspace name and access key.

Next we need to create a new project. You can simply do this by installing the required package for NodeJS, allowing to communicate with the Kafka Endpoint:

npm install node-rdkafka

For reading events from Azure Purview you can use the sample code below. Important is to copy and paste the connection string in the configuration section at the top. All changes are published to the ATLAS_ENTITIES topic, so also pay attention to this.

When everything is configured properly you can start the script by running the following command:

node consumer.js

For creating notifications go back to your Azure Purview Studio. Make some changes to your environment, for example, by scanning an existing source. If everything goes well, you should see the following output on the screen:

In the output above, you see an Array with classifications and operation type using CLASSIFICATION_ADD. This, for example, you could use to trigger a workflow or send out an email requiring somebody to investigate the newly scanned data. Or you see new data coming in via the ENTITY_UPDATE operation type, allowing you to trigger process or pipeline.

The same endpoint we could also use to submit events. Important here is that we should change the topic name to ATLAS_HOOK. In the example below you can use a JSON message to create a SQL table including two columns:

To submit this message to the Kafka endpoint use the following command:

node producer.js

If everything goes well you see the following output:

And finally you should see your newly created object in Azure Purview itself:

As demonstrated in this tutorial you can programmatically monitor for metadata changes in real-time. Via this approach you can enrich your user experience by also integrating with other services.

--

--

Piethein Strengholt
Piethein Strengholt

Written by Piethein Strengholt

Hands-on Chief Data Officer. Working @Microsoft.

No responses yet