Use Azure Purview’s REST APIs for creating custom lineage

Azure Purview is a unified data governance service that helps organizations to manage and govern their data estate. It provides a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage.

One of the great things of Azure Purview is its openness. Azure Purview uses the Apache Atlas Open API ecosystem, with some enhancements and additions by Microsoft. In this tutorial, you learn how to use these APIs by creating custom lineage.

Prerequisites

Create a service principal (application)

  1. Sign in to the Azure portal, navigate to Azure Active Directory > App registrations, and click New registration.
  2. Provide the application a name, select an account type, and click Register.
  3. Copy the following values for later use: Application (client) ID and Directory (tenant) ID

4. Next you need to create a secret. Navigate to Certifications & secrets and click New client secret.

5. Provide a Description and set the expiration to In 2 years, click Add.

6. Copy the client secret value for later use.

Set up authentication using service principal

  1. Navigate to your Purview Studio.
  2. Select the Data Map in the left menu.
  3. Select Collections.
  4. Select the root collection in the collections menu. This will be the top collection in the list, and will have the same name as your Purview account.
  5. Select the Role assignments tab.
  6. Assign the following roles to service principal created above to access various data planes in Purview.
  7. ‘Data Curator’ role to access Catalog Data plane.
  8. ‘Data Source Administrator’ role to access Scanning Data plane.
  9. ‘Collection Admin’ role to access Account Data Plane.

Get token

https://login.microsoftonline.com/{your-tenant-id}/oauth2/token

The following parameters needs to be passed to the above URL.

  • client_id: client ID of the application registered in Azure Active directory and is assigned to a data plane role for the Purview account.
  • client_secret: client secret created for the above application.
  • grant_type: This should be ‘client_credentials’.
  • resource: This should be ‘https://purview.azure.net

In my case, I’m using curl for invoking the REST API:

curl --location --request POST 'https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'client_id=ddb25b53-4e6f-41da-b646-579c742ab8ec' \
--data-urlencode 'client_secret=NFl7Q~vOkFtEqqBx~lSLk4EFLc74P7Ood2LAT' \
--data-urlencode 'resource=https://purview.azure.net'

If everything goes well you see an access_token returned. Copy this value for later use.

Next you need to obtain the ATLAS API endpoint. Navigate back to the Azure portal, open the Azure Purview account, navigate to Properties and find the Atlas endpoint. Copy the Atlas endpoint for later use.

Next, you will validate the Atlas endpoint by requesting all type definitions. Paste the access_token to the request below and make the API call:

curl --location --request GET 'https://purview-piethein.catalog.purview.azure.com/api/atlas/v2/types/typedefs' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCIsImtpZCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCJ9.eyJhdWQiOiJodHRwczovL3B1cnZpZXcuYXp1cmUubmV0IiwiaXNzIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjQ3' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode 'client_id=ddb25b53-4e6f-41da-b646-579c742ab8ec' \
--data-urlencode 'client_secret=NFl7Q~vOkFtEqqBx~lSLk4EFLc74P7Ood2LAT' \
--data-urlencode 'resource=https://purview.azure.net'

If everything goes well all the type definitions will be returned. Congratulations! Let’s continue our journey by creating new objects in Purview.

Creating objects in Purview

curl --location --request POST 'https://purview-piethein.catalog.purview.azure.com/api/atlas/v2/entity/bulk' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCIsImtpZCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCJ9.eyJhdWQiOiJodHRwczovL3B1cnZpZXcuYXp1cmUubmV0IiwiaXNzIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjQ3' \
--header 'Content-Type: application/json' \
--data-raw '{
"entities": [
{
"meanings": [
],
"status": "ACTIVE",
"version": 0,
"typeName": "DataSet",
"attributes": {
"qualifiedName": "system://input_01",
"name": "input_table01",
"description": "Input table",
"objectType": null
}
},
{
"meanings": [
],
"status": "ACTIVE",
"version": 0,
"typeName": "DataSet",
"attributes": {
"qualifiedName": "system://input_02",
"name": "input_table02",
"description": "Input table",
"objectType": null
}
},
{
"meanings": [
],
"status": "ACTIVE",
"version": 0,
"typeName": "DataSet",
"attributes": {
"qualifiedName": "system://output_01",
"name": "output_table01",
"description": "Output table",
"objectType": null
}
}
]
}'

If everything goes well you should have three new Datasets created within your Purview collection:

Important here is to capture the guidAssignments. These are the unique references that we need for creating our lineage object. Copy the information for later use.

"guidAssignments":{"-184464159039":"938fab2e-e270-4fc2-ad84-d752c1dd6560","-184464159038":"4c2115f9-a80d-467a-b7df-e93ec59505e1","-184464159037":"0589b96c-0cc6-454f-9251-32ee4aabefc0"}

Next we can make a lineage object by making another call. Use the code below and change the unique identifiers using the output from the previous call:

curl --location --request POST 'https://purview-piethein.catalog.purview.azure.com/api/atlas/v2/entity' \
--header 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCIsImtpZCI6Imwzc1EtNTBjQ0g0eEJWWkxIVEd3blNSNzY4MCJ9.eyJhdWQiOiJodHRwczovL3B1cnZpZXcuYXp1cmUubmV0IiwiaXNzIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjQ3' \
--header 'Content-Type: application/json' \
--data-raw '{
"entity": {
"status": "ACTIVE",
"version": 0,
"typeName": "Process",
"attributes": {
"inputs": [
{"guid": "938fab2e-e270-4fc2-ad84-d752c1dd6560"},
{"guid": "4c2115f9-a80d-467a-b7df-e93ec59505e1"}
],
"outputs": [
{"guid": "0589b96c-0cc6-454f-9251-32ee4aabefc0"}
],
"qualifiedName": "apacheatlas://customlineage01",
"name": "lineage01"
}
}
}'

If everything goes well you can see the lineage created within Purview.

Next steps

Registering can be done manually via the Purview portal, but as you learned also programmatically via Purview’s REST API. The big benefit is that you can apply customizations, register new types or transfer metadata from other repositories. To simplify this process I recommend you to check out the PyApacheAtlas, which allows bulk uploading using Excel templates.

Azure Purview REST APIs are largely based on the open source Apache Atlas project. Therefore many additional resources are available. The Atlas documentation is a great resource. This documentation is also provided by Microsoft: PurviewCatalogAPISwagger.zip.

Lastly, there is a CLI and great video from the Azure Purview. It explains how Purview metadata repository works and how the API can be used: https://www.youtube.com/watch?v=4qzjnMf1GN4

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store