For decades many organizations have had a central data team to run data warehouses or similar data platforms. It has been a single team with the needed skill set, and they were tasked to both discover and understand data from many domains. As a result, they became a bottleneck in many firms.
This common behavior was identified in 2019 and started the data mesh movement with the establishment of decentralized data teams. The idea is that these teams own their own data and expose this to other parts of the organization, like a product.
The decentralized exposure of data enables other parts of the organization to gather new insights or power apps without waiting for a central team.
For this to become a reality, the focus must be on the organizational setup to spread out the central team to relevant domain areas. As such, the support for this change must be anchored at a high level in the organization.
One setup could be a platform team offering the fully working but empty platform as a service to the data domain teams. These data teams build, expose and use data products under the supervision of a central governance function. The team ensures proper interaction and connection, among other things.
The exchange of data should preferably go via a defined data contract so the content, frequency, and more are explicit.
The data consumer might be concerned about correctness and future changes which could break the person’s data usage apps. The data contract is the formal agreement to meet these concerns.
It is a good practice to divide the process around a Data Contract into four phases.
The first phase is to define the contract very specifically with fields and metadata.
Phase two is related to enforcing the content, which could be via a ci/cd release pipeline. If Kafka is used, then Avro or ProtoBuf definitions are good options.
The third phase is the fulfillment which is a technical service that implements the actual data being pushed to consumers.
The last phase is monitoring for semantic changes, such as a new product state or content changes in a numeric column – e.g., from meters to centimeters.
We are currently in the process of implementing a data mesh setup like this in a large international Danish company. The area regarding Data Mesh and Data Contract is constantly developing at a high pace, and it is becoming a field that cannot be ignored due to its enormous potential. We have many skilled people at Intellishore working within this field, and it is inspiring to do so with customers ready to challenge the status quo and innovate.