Streamsets Data Collector is designed to make data ingest and processing easy. SDC integrates at several levels with Apache Spark to make data analysis using Spark very easy. SDC works with Databricks Cloud to trigger jobs based on incoming data.
In this talk, you will learn how a larger retail player with thousands of outlets is utilizing StreamSets to power Spark jobs on the Databricks cloud, combining real-time foot traffic data and historic behavioral & transaction data for analytic insights that improve revenue per square foot.
Hari Shreedharan is a Software Engineer at Streamsets, where he builds products to make data ingest easy. Previously, he was a Software Engineer at Cloudera, where he worked on Apache Spark, Apache Flume and Apache Sqoop. He is also the PMC chair of the Apache Flume project.