Skip to main content

Last week, we held a live webinar—Databricks for Data Engineers—to provide an overview of the data engineering role, common challenges data engineers face while building ETL pipelines, and how Databricks can help data engineers easily build production-quality data pipelines with Apache Spark.

Prakash Chockalingam, product manager at Databricks, also gave a live demonstration of Databricks and features that data engineers would benefit from such as:

  • Advanced cluster management functionalities that suit any workload requirements.
  • The ability to interactively build an ETL pipeline via an integrated workspace.
  • Simplified troubleshooting of jobs with monitoring alerts.
  • Job scheduling with helpful features like alerting, custom retry policies, and parallel runs.
  • Notebook workflows which allow you to build multi-stage production Spark pipelines directly from Databricks notebooks.

The webinar is now accessible on-demand, and the slides used in the webinar are also downloadable as attachments to the webinar.

We have also answered the common questions raised by webinar viewers below. If you have additional questions, check out the Databricks Forum or the new documentation resource.

If you’d like free access to Databricks, you can access the free trial here.

Common webinar questions and answers

Click on the question to see answer

How would you integrate an ETL pipeline in production with tools like Chef or Puppet, automatic testing tools for Continuous integration, and include other services?

Do you have any recommendations on the best architecture for integrating IoT data into Databricks using Apache NiFi to S3?

Can you please explain any one scenario where Spark with Yarn or Spark with Mesos can be a justified choice?

Can you please clarify R as a component of Spark?

Does your analytic layer include Spotfire?

Can you SSH into your EC2 instances?

How does Spark compare to Sqoop in transferring data from Oracle to HDFS?

Is it possible to restart a job from the failed notebook?

Does Databricks provides any APIs for notebook execution monitoring?

Is SparkSQL the only component used to build ETL pipelines?

Can we implement Type 2 logic using Spark and do inserts and updates to target an RDBMS?

What’s the main difference between Storm and Spark? Can data be processed in real time using Spark?

Try Databricks for free

Related posts

Spark Summit East 2017: Another Record-Setting Spark Summit

February 9, 2017 by Jules Damji, Wayne Chan and Dave Wang in
We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the...

Best Practices for Super Powering Your dbt Project on Databricks

dbt is a data transformation framework that enables data teams to collaboratively model, test and document data in data warehouses . Getting started...

Guide to Public Sector Talks at Data + AI Summit 2021

May 18, 2021 by Michael Ortega in
Download our guide to Public Sector at Data + AI Summit to help plan your Summit experience. The world is being transformed by...
See all Company Blog posts