Scaling and Modernizing Data Platform with Databricks

May 26, 2021 05:00 PM (PT)

Download Slides

Today a Data Platform is expected to process and analyze a multitude of sources spanning batch files, streaming sources, backend databases, REST APIs, and more. There is clearly a need for standardizing the platform that scales and be flexible letting data engineers and data scientists focus on the business problems rather than managing the infrastructure and backend services. Another key aspect of the platform is multi-tenancy to isolate the workloads and able to track cost usage per tenant. 

In this talk, Richa Singhal and Esha Shah will cover how to build a scalable Data Platform using Databricks and deploy your data pipelines effectively while managing the costs. The following topics will be covered:  

  • Key tenets of a Data Platform
  • Setup multistage environment on Databricks
  • Build data pipelines locally and test on Databricks cluster
  •  CI/CD for data pipelines with Databricks
  • Orchestrating pipelines using Apache Airflow – Change Data Capture using Databricks Delta
  • Leveraging Databricks Notebooks for Analytics and Data Science teams
In this session watch:
Richa Singhal, Senior Data Engineer, Atlassian
Esha Shah, Senior Data Engineer, Go-To-Market Data Engineering, Atlassian

 

Richa Singhal

Richa Singhal

Richa Singhal is a Senior Data Engineer at Atlassian building and deploying data pipelines enabling analytics and data science teams across Atlassian. She has 10+ years of experience building large-sc...
Read more

Esha Shah

8+ years of experience in the data engineering space over finance and marketing domains. Currently managing data engineering pipeline initiatives across marketing, sales, and enterprise at Atlassian.
Read more