Databricks Unified Analytics Platform (UAP) is a cloud-based service for running all analytics in one place – from highly reliable and performant data pipelines to state-of-the-art Machine Learning. From the original creators of Apache Spark and MLflow, it provides data science and engineering teams ready to use pre-packaged clusters with optimized Apache Spark and various ML frameworks coupled with powerful collaboration capabilities to improve productivity across the ML lifecycle. Yada yada yada… But in addition to being a vendor Databricks is also a user of UAP.
So, what have we learned by eating our own dogfood? Attend a “from the trenches report” from Suraj Acharya, Director Engineering responsible for Databricks’ in-house data engineering team how his team put Databricks technology to use, the lessons they have learned along the way and best practices for using Databricks for data engineering.
Xuan Wang is a data scientist/engineer at Databricks. He is working on building data products and ETL pipelines on top of Databricks’ Unified Analytic Platform and Apache Spark. Prior to joining Databricks, he was a postdoctoral researcher working on probabilistic models in random graphs and random medium. He received his Ph.D. in Statistics from The University of North Carolina at Chapel Hill in 2014.