On-demand

Have a Delta question? Want to know how to build an end-to-end machine learning pipeline? Looking at how you can optimize cost for your organization? Join our October best practices series to get your data questions answered live by our Databricks experts.

No matter at what stage of your data journey you’re in, our online sessions will help data professionals get a better understanding of the fundamental concepts of leveraging a simple, open and collaborative platform, the problems we’re helping to solve, and how data teams are able to work collaboratively and be more productive using one common platform for all data uses-cases.

We’ll cover best practices to help organizations use powerful open source technologies so you can build and extend your data platform investments. Plus, you’ll learn how data teams can create a huge impact — lowering costs, speeding up time to market — and powering new innovations to disrupt industries.

5 October: Delta Lake Deep Dive

Delta Lake is the next-gen unified analytics engine designed to build robust production data pipelines at scale. Find out how you can reap the benefits of Delta, mainly:

  • Understand the inner workings of Delta and how Delta log enables Delta features
  • ACID transactions on Spark: Never see inconsistent data. Do upserts on your data lake
  • Scalable metadata handling: Leverage Spark’s distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease.
  • Streaming and batch unification: A table in Delta Lake is a batch table as well as a streaming source and sink. Streaming data ingest, batch historic backfill, interactive queries all just work out of the box
  • Schema enforcement: Automatically handle schema variations to prevent insertion of bad records during ingestion
  • Time travel: Data versioning enable rollbacks, full historical audit trails, and reproducible versions of past data

12 October: Building a Scalable Machine Learning Pipeline

Discover how you can:

  • Leverage Feature Store and Feature Engineering at scale
  • Train and track models and experiments with MLflow efficiently
  • Full MLOps from development to production with MLflow and model registry
  • Get the latest updates on what’s on the ML roadmap

20 October: Deploying Workloads to Production

Learn the best practices moving your workloads to production, including:

  • Building a CI/CD pipeline with Databricks
  • Deployment workflows and version control with Databricks notebooks
  • Testing with notebooks and testing libraries
  • Scheduling and orchestration
  • Integrating with third party orchestration tools

26 October: Databricks Tips and Cost Optimization

This session aims to take the burden of guesswork off your hands and help you leverage the Databricks Lakehouse Platform to its full potential by implementing some simple tips and tricks. You’ll learn:

  • Differences between our 3 SKU offerings and picking the right one for your needs.
  • Supercharging your cluster autoscaling with pools
  • Housekeeping techniques to declutter your workspace
  • Best practices around workspace management and common hardening techniques.
  • Priming your workspace for production
  • Leveraging init to standardize your clusters and automate tasks

Speakers:

Goh Yong Hong
Solutions Architect, Databricks

Vihag Gupta
Solutions Architect, Databricks

Abhishek Dey
Solutions Architect, Databricks

Mandeep Cheema
Solutions Architect, Databricks

Event Sponsor: Databricks (Databricks Privacy Policy)
Event Co-Sponsor: Just Analytics (Just Analytics Privacy Policy)

Watch now