Sponsored by: lakeFS | Why Version Control is Essential for Your Lakehouse Architecture
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Lakehouse Architecture |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark, Delta Lake |
SKILL LEVEL | Intermediate |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
When developing and maintaining data/ML pipelines using DataBricks we tend to adopt practices that improve the quality and velocity of code development and deployment. How do we do the same for the data that is the basis for our data products? We must be able to experiment during development, test data quality in isolation and automate quality validation tests, work with full reproducibility of data pipelines, and more. If your product’s value is derived from data in the shape of analytics or machine learning, poor data quality, or lack of reproducibility of data+code, can easily translate into pain. In this session, you will discover how to implement engineering best practices to data products using data version control using lakeFS.
SESSION SPEAKERS
Oz Katz
/CTO & Co-creator of lakeFS
lakeFS