Data Management at Scale

Piethein Strengholt
19 min readJul 11, 2023

Over the last few years, decentralized architectures have emerged as the new paradigm for managing data at large. They are meant to scale distribution of data between teams, while aiming for higher value and a faster time to market.

In this article, I would like to unpack how to implement such a federated design. We cover many different things. We’ll begin with a short reflection on your data strategy, and whether you should start with a centralized or decentralized approach. Then we’ll go through the phases of implementing a data architecture, from setting the strategic direction, to laying the foundation, to professionalizing your capabilities.

Note that much content comes from the book Data Management at Scale 2nd edition. If you would like to learn more or see the depth, I encourage you to read the full version of the abstract below.

A Brief Reflection on Your Data Journey

Before you jump on the data-driven bandwagon, ensure you have a data strategy in place. Whether you’re starting small or have a large set of use cases to implement, without a plan you’re doomed to fail. I see countless enterprises fail because they’re unable to bring everybody onboard or to articulate their strategy; because they don’t include business users or lack support from senior leadership. I can’t emphasize this enough, but before you start implementing any change, ensure you have a balcony view and a clear map guiding you in the right direction.

After establishing a vision that clearly articulates your ambition and the path ahead of you, it’s time to enter the next phase. Your next steps are about communicating your vision, building the right team, optimizing your architecture, making execution plans, defining processes, and selecting use cases. Again, it’s important to get everybody aligned and committed to your objectives. During the initial stages of development, start small with the implementation, but at the same time keep the big picture in mind because your target state architecture must be inclusive for all use cases. You’ll need data governance capabilities for implementing roles, processes, policies, procedures, and standards to govern your most critical data. You’ll need master data and data quality management capabilities for ensuring consistency and trust. You need metadata for tracking lineage, capturing business context, and link ing to physical data. You need integration and analytical services for building data products…

--

--

Piethein Strengholt

Hands-on data management professional. Working @Microsoft.