Is data mesh only for analytical data?
--
Is data mesh only for analytical data? This is a common question that arises when discussing data mesh with my customers. While data mesh is often associated with only analytical data, many of its principles can also be perfectly applied to operational data. In this blog post, we will explore how data mesh can be used for both analytical and operational use cases.
Operational data refers to the data that is used to run a business in real-time, such as transactional data and customer data. By applying data mesh principles to operational data, organizations can ensure that the data is of high quality, accessible, and can be used to drive (real-time) business decisions.
The operational data architecture contrasts with the analytical data architecture because the operational plane processes commands and requires predictability and real-time processing of small datasets, while the analytical plane focuses on data reads and requires complex data analysis, which uses large datasets and isn’t that time-critical. However, there’s a large amount of overlap in the domain model, how events and APIs (like data) should be treated as products, and how the boundaries between applications and domains should be set. Your business capabilities are the same when looking at your architecture through an operational or analytical lens. The applications that provide these capabilities have teams behind them that manage them and ensure they stay up and running. The language that the team uses for development is the same. Ultimately, the unique context that influenced the application design matches the context that influences the design of your data products, events and APIs. In addition, the shift toward loosely coupled applications and autonomous, agile teams within event and API management is similar to what underpins the shift toward a data mesh architecture. It therefore makes sense to apply similar best practices for data management, event and API management.
Data and integration landing zone
In the context of building a data mesh architecture, landing zones or infrastructure blueprints are foundational for providing the necessary building blocks for a modern platform. It is essentially a standardized and automated approach to deploying cloud workloads. With respect to the subject of this blogpost, landing zones help organizations achieve standardization across different domains for both analytical and operational workloads.
The architectural diagram below depicts a landing zone that is designed to support both operational and analytical usage. It includes components for storage, real-time and offline processing, analysis, consumption, as well as governance controls.
Within the architecture itself, different patterns can be applied for both analytical and operational usage:
Data distribution patterns: In an analytical data (mesh) pattern, each team owns and manages their own analytical data products. The data is made available to other teams through, for example, a data catalog. Each team is responsible for ensuring the quality and reliability of their data product. Data can be consumed via different consumption patters, such as event-carried state transfer pattern, API pattern or a lightweight virtualization query pattern.
Application integration patterns: In an operational pattern, each team owns and manages their own operational event and API products. Each team is responsible for ensuring the quality and reliability of their endpoints. APIs can be used for commands and strong consistent reads. Events can be used for establishing asynchronous communication.
Note that the above patterns also overlap. The process and integration layer could tap into any of the different layers of the underlying data architecture. For example, an event could trigger a logic app that reads from the Silver layer for some data enrichments. From there, then, the processing layer directly feeds other domains and systems. Or change feed is used for tracking row-level changes between versions of a table in the Gold layer. Each row change results into an event that is directly distributed to other downstream consumers.
Reference diagram for tactical decisions
Designing a good application and data integration solution is a complex task, given that there may be multiple possible “right” and complementary solutions. It’s often a trade-off between different dimensions: performance, maintainability, flexibility, cost, resilience, and so on. These considerations may require you to have a deep understanding of the business problem you’re trying to solve as well. The reference diagram presented below consolidates all the major patterns that were previously discussed, making it a valuable resource for making informed tactical decisions.
Although data products, APIs, and events are used to facilitate different scenarios and use cases, there’s overlap in the domain context when using all of these patterns within the same boundaries. Therefore, I recommend aligning of all your integration and data services in terms of design guidelines, documentation, data and interface model registration and so forth. If done well, all interface attributes should be connected to same set of elements or business terms within your catalog.
Conclusion
In conclusion, data mesh is not only about analytical data. Positioning data mesh for only analytical data is a missed opportunity. Operational use cases can also benefit from it. By treating data as a product and implementing a decentralized approach to data architecture, organizations can improve the agility and efficiency of their data operations and drive better business outcomes.
If you want to learn more about managing and designing any large-scale architecture, feel free to check my book Data Management at Scale 2nd edition.