Jasper is a senior data engineer at Eventbrite with a background in international business. With 3 years of experience using Spark in production he has leveraged it in a variety of use cases. Now focuses on the operations aspect of a big data environment as well as educating other engineers on how to use the tools available to them.
Timely data in a data warehouse is a challenge many of us face, often with there being no straightforward solution.
Using a combination of batch and streaming data pipelines you can leverage the Delta Lake format to provide an enterprise data warehouse at a near real-time frequency. Delta Lake eases the ETL workload by enabling ACID transactions in a warehousing environment. Coupling this with structured streaming, you can achieve a low latency data warehouse. In this talk, we'll talk about how to use Delta Lake to improve the latency of ingestion and storage of your data warehouse tables. We'll also talk about how you can use spark streaming to build the aggregations and tables that drive your data warehouse.