Azure Purview —securely scan sources by using managed private endpoints

Piethein Strengholt
6 min readJan 27, 2022

Microsoft recently added managed private endpoints support to Azure Purview. With this new set of features you can better manage and secure your data scanning within Purview. As a result your metadata traffic is distributed via Azure Private Link, which eliminated any exposure to the internet. This protects against data exfiltration. In this demo I want to show you how this works in practice.

Resource group creation

In our demo you will start from scratch by first creating a new resource group. I selected Canada Central as the region location because these new features aren’t available on all regions yet. Please consult this link for more information.

Storage account creation

Next we will setup a storage account for demonstration. This is the resource that will be scanned during our demo. Select create new resource, choose Storage Account, select the resource group you just created, provide a unique name, and hit next.

For the Storage Account we will ensure that hierarchical namespaces are selected. Click next to jump over to the Networking tab.

For the networking we select Private endpoint as the connectivity method. Don’t create any private endpoints at this stage. This comes later.

After this step you can hit review + create, finalize and wait for the storage account to be created.

You can validate the security by creating and opening a container within the storage account. When the account is configured correctly, you should see that access is not permitted in the screen below.

Azure Purview creation

After you created your storage account we can move to creating an Azure Purview account. Select create new resource, select Purview, and provide your account details. Remember to use the same region as for the resource group and storage account. You can press review + create.

Note; we don’t deploy purview here using a private endpoint. You can consider this when you entirely want to lockdown any access to Azure Purview. For example when you allow only client to call to Azure Purview that originate from within the private network. In this demo I don’t want to overcomplicate things.

After the resources have been deployed successfully your resource group should look like this:

Authentication for a scan

Next we need to give permissions to the storage account so Azure Purview is able to scan. Open the storage account, click on IAM, assign a new role. Set the role to Storage Blob Data Reader and select the managed identity from the newly created Purview account.

At this stage we’re all set and can continue by opening Azure Purview.

Managed Virtual Network Integration Runtime

With the newly released features Azure Purview now provides three options for scanning sources:

  1. Azure Purview default’s integration runtime: this option is useful when connecting to data stores and computes services with public accessible endpoints.
  2. Self-hosted integration runtimes (SHIR): this option particularly useful for VM-based data sources or applications that either sit in a private network (VNET) or other networks, such as on-premises.
  3. Managed Virtual Network Integration Runtime: this new option supports connecting to data stores using private link service in private network environment. This ensures that data scanning process is completely isolated and secure, while also being fully managed.

For this demonstration we will use the Managed Virtual Network Integration Runtime. Use your Azure Purview Studio Portal and navigate to your data map on the left. Select integration runtime and choose Azure.

Give your new integration runtime a new, description and ensure that interactive authoring is enabled.

After deployment you must wait for the approval notifications. When ready, click on the blue links and navigate to the newly created resources.

For the newly created resources you must approve the Private Endpoint connections. Click on each of them and change the status to Approve.

Next, hover back to Azure Purview and look up your newly created private endpoints. You can find this option on the left (settings).

Private endpoint for Azure Blob Storage

The next step is creating a private endpoint for the newly created storage account. When you are still in the managed private endpoints section, click on new. Select Azure Data Lake Storage Gen2 and continue.

Lookup the storage account name under your subscription and click Create.

The same approval process kicks in. Navigate to your storage account, go to networking, private endpoint connections, and you will see a newly requested item is created. Repeat the same steps by approving the endpoint.

Go back to Azure Purview and wait for all managed private endpoints to be approved.

Configure source and scanning

Next you need to add your newly created source and setup the scanning. Go to the data map and collection overview. Add a new source and click on Azure Data Lake Storage Gen2.

Next, register your source. You will see the public endpoint listed here, but this configuration will be overwritten once we start scanning. Hit register and finish.

Next you configure scanning for your newly created source. It’s important to select the IntegrationRuntime (Managed Virtual Network) from the list. Add your source to a collection, and hit continue.

Finally, you must test your connection and hit continue. Complete the scanning by selecting a schedule.

If everything goes well you’ll notice new metadata will be added to Purview. This all will be very secure, because all metadata is transferred using private endpoints.

--

--

Piethein Strengholt
Piethein Strengholt

Written by Piethein Strengholt

Hands-on Chief Data Officer. Working @Microsoft.

No responses yet