Ganesh Chand

Data Engineer, Databricks

Ganesh Chand is a data engineering consultant at Databricks with 10+ years of industry experience in building enterprise-scale Data solutions. He is particularly passionate about solving world’s toughest data engineering problems. At Databricks, he is busy tackling some of the toughest data engineering projects for Databricks customers. Outside of Databricks, he manages and runs Kathmandu Apache Spark meetup group and has given numerous presentations and workshops on Apache Spark and functional programming using Scala.

Past sessions

Summit 2021 Let’s Dumb-Proof Data Pipelines

May 27, 2021 05:00 PM PT

Developing and deploying data pipelines in production is easy. Maintaining data pipelines is hard because most often it's not the same engineer or team responsible for operating and maintaining data pipelines in production. If your data pipelines are not parameterized and configurable, you need to recompile your source code and go through your release process even for simple configuration changes. Making your data pipelines configurable is not enough. Bad user input can result in many classes of issues such as data loss, data corruption. data correctness, etc.

In this talk, you'll walk away with techniques to make your data pipelines dumb-proof.
1. Why do you need to make your data pipelines configurable?
2. How to seamlessly promote your data pipelines from one environment to another without making any source code changes?
3. How to reconfigure your data pipelines in production without recompiling the ETL source code?
4. What are the Pros and Cons of using Databricks Notebook widgets for configuring your data pipelines
5. How to externalize configurations from your ETL source code and how to read and parse configuration files
6. Finally, you'll learn how to take it to next level by leveraging Scala language features, pure config, and typesafe config libraries to achieve boilerplate free configuration code and configuration validations

In this session watch:
Ganesh Chand, Data Engineer, Databricks

[daisna21-sessions-od]

Summit Europe 2019 Building Data Intensive Analytic Application on Top of Delta Lakes

October 15, 2019 05:00 PM PT

Why to build your own analytics application on top on Delta lake : - Every enterprise is building a data lake. However, these data lakes are plagued by low user adoption, poor data quality, and result in lower ROI. - BI tools may not be enough for your use case, especially, when you want to build a data driven analytical web application such as paysa. - Delta's ACID guarantees allows you to build a real-time reporting app that displays consistent and reliable data

In this talk we will learn :

  • how to build your own analytics app on top of delta lake.
  • how Delta Lake helps you build pristine data lake with several ways to expose data to end-users
  • how analytics web application can be backed by custom Query layer that executes Spark SQL in remote Databricks cluster.
  • We'll explore various options to build an analytics application using various backend technologies.
  • Various Architecture pattern/components/frameworks can be used to build custom analytics platform in no time.
  • How to leverage machine learning to build advanced analytics applications Demo: Analytics application built on Play Framework(for back-end), React(for front-end), Structured Streaming for ingesting data from Delta table. Live query analytics on real time data ML predictions based on analytics data