Albert Franzi Cros - Databricks

Albert Franzi Cros

Data Engineer, Schibsted Product and Tech

Albert Franzi is a Software Engineer who fell so in love with data that ended up as Data Engineer for the Schibsted Media Group. He believes in a world where distributed organizations can work together to build common and reusable tools to empower their users and projects. Albert cares deeply about unified data and models as well as data quality and enrichment. He also has a secret plan to conquer the world with data, insights, and penguins.

UPCOMING SESSIONS

PAST SESSIONS

Modular Apache Spark: Transform Your Code in PiecesSummit 2019

Divide and you will conquer Apache Spark. It's quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code.

We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces. These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by latest code changes.

This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together. Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of "Transform" method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic.

This talk is oriented to developers that are being introduced in the Spark world and how developing iteration by iteration in small steps could help them in producing great code with less effort.

Modular Apache Spark: Transform Your Code in PiecesSummit Europe 2018

Divide and you will conquer Apache Spark. It's quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code. We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces. These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git (github.com/rpau/junit4git) in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by latest code changes. This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together. Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of "Transform" method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic. This talk is oriented to developers that are being introduced in the Spark world and how developing iteration by iteration in small steps could help them in producing great code with less effort. Session hashtag: #SAISDev3