- "How Traveloka's Runs Cloud-Scale Apache Spark in Production Since 2017" - in this Level 301 knowledge transfer, Traveloka's Data Engineering and Data Science team will share how the staff submit their cloud-scale Spark jobs today. Discussion of pros/cons, integration of Apache Spark with CI/CD components, Schedulers, Airflow, Key Management Systems (KMS), templates. Journey will start at historic event of a self-managed Spark cluster on-premise, and talk through adoption of AWS EMR, Qubole, Databricks, and Dataproc. How multiple back-end data sets helped transform Traveloka from meta-search engine to fully integrated On-Line Travel Booking agency, and one of top Indonesian Unicorn startups! - "Building Robust Production Data Pipelines with Databricks Delta" - (optional hands-on experience: prepare laptop with Chrome/Firefox browser and register on Databricks Community Edition). Following open-source announcement of Delta Lake, this walk-through will prove insights on how Delta.io employs co-designed compute and storage and how it is compatible with Spark API’s. Delta Lakes power high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. This tutorial will discuss requirements of modern data pipelines, the challenges data engineers face when it comes to data reliability and performance and how Delta can help. Through presentation, code examples and notebooks will be shared.
Join us for the next Apache Spark London Meetup! After all the excitement of Spark Summit the US we thought it would be great to have a followup meetup. As usual, there will be some food and refreshments and an opportunity to network as well as some great talks! So join us for an evening of Apache Spark! Title: The Road to Upcoming Apache Spark 3.0 and Koalas: Unifying Spark and pandas APIs