Feature stores and data mesh

One of my customers asked me a question: what is your view on features stores? And how to position them in the data mesh paradigm?

My view: I’m no big fan of centralized or shared feature stores. They add unnecessary complexity to your architecture. Each individual analytical use case is unique and requires unique data. For example, within marketing you can predicting elder customers and younger customers. Each use case requires different features. Asking these different use cases to conform to the same underlying set of features require many compromises. You probably end up with a feature store that doesn’t add any value to each use case.

However, there might be situations were pre-processing or building features takes very long. Or when features are time-depended. Or when features depend on other features. In these situations, a store is needed. Such a feature store is just an ordinary database, which is part of the inner-architecture’s use case. Thus, part of the domain.

Coming back to data mesh: requesting and provisioning these databases should be part of the self-service data platform. Domains must be responsible for owning and maintaining these stores. When sharing any data to other domains, domains must adhere to the same principles as regular data distribution. So, use data products instead of directly integrating databases with other domains.

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store