Dr. Arif Wider is a professor for software engineering at HTW Berlin, Germany, and a lead technology consultant with ThoughtWorks where he worked with Zhamak Dehghani who coined the term Data Mesh in 2019. Next to teaching, he enjoys building scalable software that makes an impact, as well as building teams that create such software. More specifically he is fascinated by applications of Artificial Intelligence and how effectively building such applications requires data scientists and developers (like himself) to work closely together.
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe's biggest online fashion retailer - we realized that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh. The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership. This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture backed by Spark and build on Delta Lake, and will outline the ongoing efforts to make creation of data products as simple as applying a template.