Understanding Master Data Management’s integration challenges

Piethein Strengholt
9 min readDec 7, 2023

A successful Master Data Management (MDM) implementation is no simple task. It involves choosing the right scope with approach, and overcoming numerous data integration hurdles such as data quality, data consolidation from disparate sources, and maintaining data uniformity across different business units. In this post, we’ll explore some of these challenges in detail, offering insights into how they can be effectively managed to ensure your MDM strategy delivers the most value.

Before we delve deep into the complexities of MDM, it’s crucial to first establish a solid understanding of the concept itself, comprehend how different vendors set themselves apart, and explore the various strategies for integrating master data. Following this, we’ll examine an example using the Coexistence style.

Introduction

MDM is a comprehensive method of defining and managing an organization’s critical data. It provides a single, unified view of data across the enterprise, creating an authoritative source that supports operational business applications and analytical downstream consumers. The importance of MDM cannot be overstated. It plays a significant role in improving data quality, reducing inconsistency, and facilitating better decision making. Moreover, it facilitates a holistic view of key data entities, such as customers, products, suppliers, among others, thereby boosting operational efficiency and delivering insightful analytics for strategic planning.

MDM vendors distinguish themselves from others in several ways. Some excel in data integration with many native out-of-the-box connectors, while others may focus more on data quality and governance, including the life cycle management of issues using workflows, progression dashboards, notification services and more. Others may differentiate themselves by incorporating Artificial Intelligence and Machine Learning into their solutions for enhancing the process of matching and merging. So, determining the scope and business objectives before implementing MDM is a crucial part of your overall strategy.

For the MDM process itself: there are different challenges associated with managing master data. Gartner, a leading research and advisory company, has proposed four styles or approaches of MDM, namely: Consolidation, Registry, Coexistence, and Centralized. The Consolidation style involves collecting master data from various sources into a central MDM system. The Registry style links to and indexes master data from source systems. The Coexistence style combines elements of both the Consolidation and Registry styles, while the Centralized style pushes master data from the MDM system back to the source systems.

An abstract comparison of the four master data management implementation styles from Gartner (Source: Data Management at Scale 2nd edition)

The most complex MDM style is the Coexistence style, which is a hybrid approach that combines the Consolidation and Registry styles. It provides a flexible and effective solution that maintains a central hub of master data while allowing the data to coexist in its original source system.

Discussing a real-life Coexistence style example

The Coexistence pattern within Master Data Management is involving, especially when it comes to integration. A real-life example of the style can be seen in the diagram below, which includes multiple source systems.

An illustrative architecture demonstrates the complete data distribution process between source systems and the MDM system. Employing a native connector from the MDM system to extract data from your operational systems can offer multiple advantages, including simplified integration. Nevertheless, the decision to use a native connector or a custom-built one largely hinges on your unique requirements, the intricacy of your data, the systems you’re integrating, and the functionalities of your MDM system. (Image credit: Piethein Strengholt)

In the illustrative architecture, MDM is used for creating a unified view of unique customer data, which includes the assignment of master identifiers. This process can only be done centrally, not locally within systems. Therefore, it requires customer data from the different source systems to be consolidated in the MDM system (reflecting the Consolidation style). The MDM system using this approach also maintains links to the original data in each branch’s system (reflecting the Registry style).

The integration of data within MDM is a very complex task, which should not be underestimated. Many organizations often have a myriad of source systems, each with its own data structure and format. These systems can range from commercial CRM or ERP systems to custom-built legacy software, all of which may use different data models, definitions, and standards. In addition, organizations often desire real-time or near-real-time synchronization between the MDM system and the source systems. Any changes in the source systems need to be immediately reflected in the MDM system to ensure data accuracy and consistency.

Using a native connector from the MDM system to read data from your operational systems can provide several benefits, such as ease of integration. This has been illustrated at the bottom in the image above. However, the choice of using a native connector or a custom-built one mostly depends on your specific needs, the complexity of your data, the systems you’re integrating, and the capabilities of your MDM system. Let’s discuss several aspects of integration in the next sections.

Batch processing within MDM

Batch processing in MDM systems typically involves the steps of extraction, processing, and integration, as seen in the image above. Regarding these activities, a majority of vendors offer a degree of adaptability in terms of utilizing pre-configured or standard data models for concepts like customer, product, or supplier data. However, the intricacies involved in data mapping activities, as part of your overall Extract, Transform, Load (ETL) process, should not be underestimated due to the diverseness, complexity and unique characteristics of each organization’s data.

The use of ETL within MDM is a more nuanced concept. In batch processing mode, you’re not always restricted to using the ETL services provided by the MDM system. For instance, you can opt to use your own ETL tools or services if you prefer or if your specific data integration needs demand more complex or customized processing. This could be the case when you desire more flexibility or control, or when dealing with data that isn’t solely for the MDM system(s). In this regard, I’ve noticed many organizations utilizing Lakehouse-based architectures for data processing before the data is loaded into the MDM system. We’ll come back to this when discussing reference data management.

Real-time synchronization

For real-time synchronization, enterprise typically use a combination of different patterns:

  • Event-Driven Approach: This approach involves setting up event listeners or triggers in the source systems. For example, whenever a customer updates their shipping address in the CRM, the change event is captured and sent to the MDM system. The MDM system then propagates the change to all other systems in real-time.
  • API-Based Integration: Modern MDM systems often come with APIs (Application Programming Interfaces) that can be used to read and write data. These APIs can be called in real-time by the source systems to update the MDM system whenever there’s a change, and vice versa.
  • Message Queueing: For scenarios where immediate synchronization is not feasible or necessary, a message queueing system can be used. Changes are sent as messages to a queue and are then processed in near-real-time by the MDM system.

Note that real-time data mappings and transformations are essential in MDM to ensure that data from different sources can be combined into a single, consistent, and accurate view. Different systems may structure the same data differently using various formats. For example, one system might emit an event using a full name as a single field, while another might break it down into first name and last name.

For real-time synchronization this means data must be transformed in real-time or near real-time. Let’s reflect what happens in the example architecture, as seen in the image above. An employee registers a new customer in Microsoft Dynamics 365, which triggers an event or a Power Automate flow. The customer event is captured and sent to an event processing service (Event Hubs). This service has a function that triggers when a customer event is received. The function transforms the input customer data (e.g., address IDs, relationship ID, timestamps) into a more useful format (e.g., address details, relationship information, registration date) by, for example, enriching it with data from other services. The transformed data is then sent to the MDM system for further processing and real-time matching and merging.

The process of transmitting information from the MDM system back to the operational systems in real-time or near real-time operates in a similar manner. Events can be triggered, or APIs may be invoked. Moreover, another series of transformation steps may be necessary, depending on the discrepancies and mismatches between source systems and the MDM system. The information relayed back typically combines corrected data, a master identifier (a unique key that identifies records in the MDM database), and enriched data, such as hierarchy or relationship information. It’s important to note that the actual process could be more complex than what’s described here. A typical business process often requires additional validation steps before the corrected data is processed within the operational system.

Although many MDM solutions can manage both reference data and master data, I recommend clearly delineating between how to manage these two, especially in a larger scale environment. Reference data is data used to define, classify, organize, group, or categorize other data (or value hierarchies, like relationships between product and geographic hierarchies). Master data, by contrast, is the core data that is absolutely essential for the enterprise. Each type is typically managed differently and comes with its own methods of data distribution.

Managing reference data

Unlike master data, reference data can be managed within individual domains. To ensure consistency of reference data across various domains, it’s crucial to guide your teams to synchronize their data distribution processes with the centrally managed reference data. Currency codes, country codes, and product codes are common examples of reference data that should be provided. This data can be published, for example, in a central master data management system. When any of your domains distribute data to other domains, they should use the identifiers from the enterprise reference data to classify the data. The mapping of local data to enterprise data allows other domains to quickly recognize master data.

Domain perform the mapping activities themselves, using synchronized copies of enterprise reference data to look up the reference values before distributing any data to other domains or consumers. When using a Lakehouse, the MDM enrichments occur typically in the silver layer, where data from different source systems hasn’t been joined together yet.

Of course, if domains are to map local reference data to centrally managed reference data, they need to know exactly what should be mapped to what — for example, if alpha-2 codes (e.g., US) and numeric values (e.g., 840) for country codes should always be mapped to the standard alpha-3 codes (e.g., USA).

An alternative is to offer central MDM services so domains can manage the local-to-enterprise mapping within the MDM solution. In this case, each time when data is published, a postprocessing activity will run to check for inconsistencies and enrich the data product with enterprise reference data. This approach might take more work to implement up front, but it removes the need to build solutions for this within all of the respective domains. It also guarantees better quality because the mappings are validated on a central level.

You can use both approaches to adding reference data side by side. That is, some domains might do the mapping themselves, while other domains rely on centrally managed services. In this case, a central governance body should oversee reference data management across the enterprise to ensure the goals of standardization, quality, and operational efficiency are met.

When asking your domains to conform their data to central reference data, there are different approaches you can follow. One option is to provide central MDM services so domains can discover, understand, select, and incorporate enterprise reference values when building their datasets, i.e. data products. For example, you might have a data catalog that references a central storage account in which all reference data is hosted. With this approach, domains perform the mapping activities themselves, using synchronized copies of enterprise reference data to look up the reference values before distributing any data products.

Conclusion

Despite the numerous benefits, MDM can be challenging to maintain due to the many integration challenges. These can range from data standardization issues to the complexity of integrating disparate source systems and the need for continuous synchronization between the MDM system and source systems. However, with the right strategy, tools, and governance framework, these challenges can be effectively managed, making MDM a powerful asset in the data management landscape.

A significant portion of the content in this blog post has been sourced from the book “Data Management at Scale.” For readers who are interested in gaining a deeper understanding of master data management and its associated challenges, there is a comprehensive chapter dedicated to this topic. I highly recommend taking a look for yourself.

--

--

Piethein Strengholt

Hands-on data management professional. Working @Microsoft.