Databricks Unified Analytics Platform (UAP) is a cloud-based service for running all analytics in one place – from highly reliable and performant data pipelines to state-of-the-art Machine Learning. From the original creators of Apache Spark and MLflow, it provides data science and engineering teams ready to use pre-packaged clusters with optimized Apache Spark and various ML frameworks coupled with powerful collaboration capabilities to improve productivity across the ML lifecycle. Yada yada yada… But in addition to being a vendor Databricks is also a user of UAP.
So, what have we learned by eating our own dogfood? Attend a “from the trenches report” from Suraj Acharya, Director Engineering responsible for Databricks’ in-house data engineering team how his team put Databricks technology to use, the lessons they have learned along the way and best practices for using Databricks for data engineering.
Suraj leads the Data team at Databricks, which is responsible for telemetry and analytics, and the internal platform for users of business and product data. Previously he’s lead Data and Infrastructure teams at Foursquare, Yahoo and BEA.
Xuan Wang is a data scientist/engineer at Databricks. He is working on building data products and ETL pipelines on top of Databricks’ Unified Analytic Platform and Apache Spark. Prior to joining Databricks, he was a postdoctoral researcher working on probabilistic models in random graphs and random medium. He received his Ph.D. in Statistics from The University of North Carolina at Chapel Hill in 2014.