Events - Databricks

Events

Filter:

Automated Hyperparameter Tuning, Scaling and Tracking on Databricks

Webinar

In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, and Bayesian optimization). We will then discuss open source tools that implement each of these techniques, helping to automate the search over hyperparameters. Finally, we will discuss and demo improvements we built for these tools in Databricks, including integration with MLflow:

  • Apache PySpark MLlib integration with MLflow for automatically tracking tuning
  • Hyperopt integration with Apache Spark to distribute tuning and with MLflow for automatic tracking

MLflow Meetup: Bay Area MLflow 1.0 Meetup @ Microsoft

Meetup

Sunnyvale, CA

Join us for an evening of tech-talks about MLflow and Machine Learning from Databricks and Microsoft. This meetup is being sponsored by Microsoft.

Unified Analytics – Unifying Data Pipelines & Machine Learning with Apache Spark

Regional Event

Minneapolis, Minnesota

In this workshop, we’ll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. We’ll discuss how to leverage Apache Spark™️, the de-facto data processing and analytics engine in enterprises today, for data preparation as it unifies data at massive scale across various sources. You’ll learn how to use ML frameworks (i.e. Tensorflow, XGBoost, Scikit-Learn, etc.) to train models based on different requirements. And finally, you can learn how to use MLflow to track experiment runs between multiple users within a reproducible environment and manage the deployment of models to production.

Unified Analytics Workshop for Financial Services & Insurance Industries

Regional Event

London, UK

In this workshop, we’ll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. We’ll discuss how to leverage Apache Spark™, the de-facto data processing and analytics engine in enterprises today, for data preparation as it unifies data at massive scale across various sources. You’ll learn how to use ML frameworks (e.g. Tensorflow, XGBoost, Scikit-Learn, etc.) to train models based on different requirements. And finally, you can learn how to use Mlflow to track experiment runs between multiple users within a reproducible environment, and manage the deployment of models to production.

Getting Data Ready for Data Science

Webinar

Successful data science relies on solid data engineering to furnish reliable data. Delta Lake is an open source storage layer that brings reliability to data lakes allowing you to provide reliable data for data science and analytics. This webinar will cover modern data engineering in the context of the data science lifecycle and how the use of Delta Lake can help enable your data science initiatives.

SF PyData Meetup: Pandas on Apache Spark

Meetup

San Francisco, CA

Our next event features Reynold Xin, co-founder of Databricks, telling us how we can finally make pandas-based projects infinitely scalable using Databricks' new "Koalas" project.

We'll follow that up with Google's Hanoi Hantrakul telling us about "Magenta", Google's initiative to use AI to enhance human creativity (with special live musical demos).

Live Demo: Delta Lake

Live Demo

See how Delta Lake can help you build reliable data lakes at scale. Live demo by Databricks expert. Save your spot!

Delta Lake Meetup: Open Source Reliability and Quality for Data Lakes

Meetup

Boston, MA

Big thanks to our sponsor Wayfair!

Our speaker Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. He received his Ph.D. from UC Berkeley in 2013 and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage, and query optimization.

Michael will speak about Delta Lake OSS https://delta.io/

Unified Analytics Workshop for Financial Services & Insurance Industries

Regional Event

Amsterdam, Netherlands

In this workshop, we’ll cover best practices for enterprises to use powerful open source technologies to simplify and scale your ML efforts. We’ll discuss how to leverage Apache Spark™, the de-facto data processing and analytics engine in enterprises today, for data preparation as it unifies data at massive scale across various sources. You’ll learn how to use ML frameworks (e.g. Tensorflow, XGBoost, Scikit-Learn, etc.) to train models based on different requirements. And finally, you can learn how to use Mlflow to track experiment runs between multiple users within a reproducible environment, and manage the deployment of models to production.

Delta Lake Meetup: Open Source Reliability for Data Lake with Apache Spark by Michael Armbrust

Meetup

Los Angeles, CA

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.