Myles Baker is a Solutions Architect who helps large enterprises develop Apache Spark applications using Databricks. His work on image processing software at NASA introduced him to distributed computing, and since then he has helped clients build data science models and applications at-scale spanning multiple industries. He received a B.S. in Applied Mathematics from Baylor University and an M.S. in Computer Science from the College of William and Mary.
Persisting data from Amazon Kinesis using Amazon Kinesis Firehose is a popular pattern for streaming projects. However, building real-time analytics on these data introduces challenges, including managing the format, size and frequency of the files created. This session will present an end-to-end use case for deploying machine learning streaming analytics at-scale using Structured Streaming on Databricks. We will deploy a high-volume Kinesis producer, persist the data to S3 using Kinesis Firehose, partition and write the data using Parquet, create a machine learning model and, finally, query and visualize the data in real time. Key takeaways include: - Create a Kinesis producer - Persist to S3 using Kinesis Firehose - ETL, machine learning, and exploratory data analysis using Structured Streaming Session hashtag: #SFexp6
Session hashtag: #EUent5