Albert Franzi Cros

Data Engineer Lead, Typeform

Albert Franzi is a Software Engineer who fell so in love with data that ended up as Data Engineer Lead in the Data Platform Team at Typeform. He believes in a world where distributed organizations can work together to build common and reusable tools to empower their users and projects. Albert deeply cares about unified data and models as well as data quality and enrichment.

He also has a secret plan to conquer the world with data, insights, and penguins.

Past sessions

Summit Europe 2020 Apache Spark Streaming in K8s with ArgoCD & Spark Operator

November 18, 2020 04:00 PM PT

Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.

The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.

In our talk, we will be covering our journey about how we ended up with a CI/CD setup composed by a Spark Streaming job in K8s consuming from Kafka, using the Spark Operator and deploying with ArgoCD.

This talk aims to provide an holistic view: from the early and more rudimentary model - batch setup in Airflow & EC2 instances - to the current and more sophisticated one - a streaming with CI/CD in K8s.

Speaker: Albert Franzi Cros

Summit 2019 Modular Apache Spark: Transform Your Code in Pieces NA

April 24, 2019 05:00 PM PT

Divide and you will conquer Apache Spark. It's quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code.

We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces. These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by latest code changes.

This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together. Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of "Transform" method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic.

This talk is oriented to developers that are being introduced in the Spark world and how developing iteration by iteration in small steps could help them in producing great code with less effort.

Summit Europe 2018 Modular Apache Spark: Transform Your Code in Pieces EU

October 2, 2018 05:00 PM PT

Divide and you will conquer Apache Spark. It's quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code. We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces.

These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git (github.com/rpau/junit4git) in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by latest code changes. This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together.

Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of "Transform" method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic. This talk is oriented to developers that are being introduced in the Spark world and how developing iteration by iteration in small steps could help them in producing great code with less effort.

Session hashtag: #SAISDev3