I recently met Alexey Utkin, Sentior VP Capital Markets at DataArt to discuss the data mesh architecture that is gaining prominence with enterprises today and whether the technology will replace data warehouses, lakes, and other architectures.
insideBIGDATA: Will it replace data warehouses, lakes and other architectures?
Alexei Utkin: First off, the data mesh, while cool and relatively new to a block, isn’t for everyone. It is quite possible that depending on your scale and ambition, a cloud data warehouse, data lake, lake house or other architecture is an appropriate choice for your organization today or tomorrow. The data mesh paradigm aims to address several shortcomings of these centralized architectures and associated implementation and operational approaches, which lead to a lack of scalability and agility in implementing an increasing number of cases. of analytical use using an increasing variety of data sources. For some organizations, these limitations are theoretical or may appear in the distant future.
Yet, it’s still worth considering some of the data mesh principles and underlying drivers of your organization, especially business domain orientation and domain ownership of data and data as a product. , even if you don’t go for a full data mesh platform today. . You may find that your organization derives much more value from higher quality data products that are more easily discoverable and accessible sooner rather than later. Although you choose to align with those data mesh principles that bring value today, such as how you organize data product teams and data ownership, your data infrastructure can take the form of ‘a cloud data warehouse or a Lakehouse while it is not limiting.
insideBIGDATA: If not, how will it complement these existing architectures?
Alexei Utkin: In my opinion, there are two possible ways to combine existing data architectures with the data mesh concept.
First, most organizations don’t start with a clean sheet. They already have an existing data platform or a number of them; mostly in the form of data warehouses, data lakes or lake houses. Data mesh, not being a particular infrastructure technology or product, requires underlying infrastructure and platform capabilities. And companies embarking on a data mesh journey often choose to retain existing data infrastructure initially, and over time extend and scale it to data mesh capabilities.
Second, the data mesh concept advocated ownership of data products and the pipeline within teams organized around business domains. These domain data products require infrastructure to store, process, and serve the data. Thus, existing data architectures, such as data warehouses and lakes, can become an infrastructure for specific domain data products, where appropriate. In other works, they can become nodes on the data mesh.
insideBIGDATA: What are the benefits of implementing a data mesh architecture?
Alexei Utkin: Some of the main advantages of the data mesh concept are decentralization and domain orientation. Data mesh aligns with the inherently decentralized and ubiquitous nature of data and removes the common frictions associated with centralized data teams, which sooner or later become a bottleneck on the path to a data-driven organization. The data mesh aligns the data ecosystem with the way organizations are structured, i.e. around business domains. This ensures that the people who deal with domain data and create domain data products truly understand the business domain, understand where the data comes from, what it means, who and how consumes it; these domain data people are better able to connect data from source operational systems and the analytical needs of users. This leads to more valuable data products and uses of data for an organization. Additionally, the concept of data as a product pushes often struggling organizational efforts to govern data, ensure quality, make data easily discoverable, understandable and consumable to a new level, shifting ownership of these matters to the commercial areas.
Some of the other key benefits of the data mesh architecture relate to the capabilities of the self-service platform. Whereas data mesh advocates transforming the organizational data landscape into an ecosystem of domain-driven data products, self-constructed by domain data teams. This vision must be supported by foundational infrastructure capabilities to eliminate the frictions associated with creating and evolving domain data products. These platform capabilities must also ensure that the data is discoverable, auditable, interoperable and consumable on a global scale.
The list of these capabilities is not unique to data mesh and is part of any data architecture that I would call modern, including data warehousing, data lakes, and lake houses. For example, this includes polyglot data storage, implementation and orchestration of data pipelines, data product discovery, access control, data cataloging and lineage, monitoring and alert, data quality management. Yet in data mesh, the focus is on providing these capabilities to domain data product teams from a self-service platform tier, rather than requiring them to spend time and effort themselves.
insideBIGDATA: What challenges do these companies need to keep in mind when approaching or building the data mesh?
Alexei Utkin: There are few categories of challenges that organizations face on their journey to data meshing.
First of all, it’s still a relatively new concept. Large enterprises have started experimenting with data mesh over the past couple of years. There is still a lot of discussion about how some of the principles work best in practice and not a lot of solid experience due to novelty. Organizations often use data mesh concepts as the direction in which their technology should evolve, but it’s a journey and it takes time.
Another category of challenges relates to changes in organization, roles, team building and skills. Move away from centralized team and centralized architecture, refocus data teams around domain data products and data platform, embrace product thinking and data-as-product approach , breaking down historical technology specializations, and learning new skills and tools – it all takes time and organizational will.
And a third category is technology. Data technologies have been evolving at an accelerating rate since we started hearing the words “big data” in the early 2000s. Advances in open source and cloud data technologies have driven an evolution towards modern data platforms during the last years. Yet the data technology landscape is very diverse. Most of the data mesh platform capabilities exist, but as separate technologies, separate building blocks; and some of them are not very mature. There are industry initiatives to standardize certain data capabilities, such as standard metadata format and exchange interfaces, but these are still not common. Today, it takes skill, effort, and a bit of thought to integrate into a competitive data mesh platform, while your teams must commit to learning and retraining. I believe that over time, with more and more organizations adopting data mesh, all of these technologies, standards and approaches will mature and we will see data mesh move from an early adoption stage to a mainstream dominant.
insideBIGDATA: Ultimately, what does this help companies do?
Alexei Utkin: Data mesh fundamentally helps scale an organization’s ability to work with an increasing number of data sources, the ability to rapidly implement new and diverse use cases in data and analytics, the ability to support an ever-increasing number of consumers. With the data-as-a-product approach, data mesh helps to improve the quality, accessibility, interoperability and usability, and therefore the value of data for internal external consumers; enable data-driven organization.
Sign up for free at insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1