SCD Delta tables using Synapse Spark Pools
One of my customers asked whether it is possible to build up Slowly Changing Dimensions (SCD) using Delta files and Synapse Spark Pools. Yes, you can easily do this, which also means that you maintain a log of old and new records in a table or database. To show you how this works, please have a look at the code snippets of my demo.
First, you need to import all libraries and define the schema the original dataset. This is your starting point. I use the effectiveDate and endDate for type2 opening and closing line items. Second, you create the original data and store it as DELTA.
Next, you define the newly data to be loaded in. Generally, this is a full offload. In the code below I created an additional folder for storing the newly data in DELTA. In a typical setup you might want to retain this data for (ad-hoc) analysis.
The next code block joins the original and newly created data. Next, it adds actions for archiving, retaining and inserting data, by comparing all data. At the end all data is merged and written back to the original location.
If everything goes well you table structure should look like below.