Data lineage tracking is one of the significant problems that financial institutions face when using modern big data tools. This presentation describes Spline – a data lineage tracking and visualization tool for Apache Spark. Spline captures and stores lineage information from internal Spark execution plans and visualizes it in a user-friendly manner.
Session hashtag: #EUent3
Jan has studied his BA & MCS in Trinity College Dublin. During his studies, he worked as an intern in SAP. During this work, he earned valuable experience with in-memory database systems, which led to his interests in big data technologies. In 2014, he started a PhD in FMG, TCD, with focus on optimising the resource utilisation of big data frameworks namely MapReduce. In 2015, Jan started working as a big data engineer for Barclays Africa in Prague. He is now in charge of building an internal big data engineering expertise and development of new tools and products including Spline.
Marek obtained bachelor and master degree in computer science at Charles University in Prague. His master studies were mainly focused on development of distributed and dependable systems. In 2013, Marek joined ABSA Capital in Prague to develop a scalable data integration platform and a framework for calculating regulatory reports. During the work on those two projects, he gained experience with many NoSQL and distributed technologies (e.g. Kafka, Zookeper, Spark). Nowadays, he is a member of Big Data Engineering team and primarily focused on development of the Spline project.