We started out processing big data using AWS S3, EMR clusters, and Athena to serve Analytics data extracts to Tableau BI.
However as our data and teams sizes increased, Avro schemas from source data evolved, and we attempted to serve analytics data through Web apps, we hit a number of limitations in the AWS EMR, Glue/Athena approach.
This is a story of how we scaled out our data processing and boosted team productivity to meet our current demand for insights from 20M+ Smart Homes and 500M+ devices across the globe, from numerous internal business teams and our 150+ CSP partners.
We will describe lessons learnt and best practices established as we enabled our teams with DataBricks autoscaling Job clusters and Notebooks and migrated our Avro/Parquet data to use MetaStore, SQL Endpoints and SQLA Console, while charting the path to the Delta lake…
"Sameer Vaidya leads data architecture at Plume, serving the exponentially growing demands for Analytics and BI/insights from Products, Marketing, NetOps, Sales, Finance/Accounting and 170+ CSP customers across the world. He has played an instrumental role in driving innovation and growth through pragmatic adoption of new technologies, defining and implementing architectural blue prints, enabling and mentoring teams, disseminating knowledge, best practices, training and fostering collaboration."