Mate is Practice Lead and Principal Instructor at Databricks. Mate also serves as CEO and Principle Instructor at Datapao, a Big Data and Cloud consultancy and training firm, focusing on industrial applications (aka Industry 4.0). Previously he was Co-Founder and CTO of enbrite.ly, an award-winning Budapest based startup.Mate has experience spanning more than a decade with Big Data architectures, data analytics pipelines, operation of infrastructures and growing organisations by focusing on culture. Speaker and organiser of local and international conferences and meetups.
November 18, 2020 04:00 PM PT
Building a curated data lake on real time data is an emerging data warehouse pattern with delta. However in the real world, what we many times face ourselves with is dynamically changing schemas which pose a big challenge to incorporate without downtimes.
In this presentation we will present how we built a robust streaming ETL pipeline that can handle changing schemas and unseen event types with zero downtimes. The pipeline can infer changed schemas, adjust the underlying tables and create new tables and ingestion streams when it detects a new event type. We will show the details how to infer the schemas on the fly and how to track and store these schemas when you don't have the luxury of having a schema registry in the system.
With potentially hundreds of streams, it’s important how we deploy these streams and make them operational on Databricks. We also address this aspect of real-time data pipeline and provide production experience on how this approach performs for ever growing ingestion loads from data providers in both cost and performance.
Speakers: Mate Gulyas and Shasidhar Eranti
July 2, 2022 04:02 PM PT
Deploying and operating Machine Learning models is crucial for manufacturing and in IoT use cases. In this live demo, you'll see how to build a Structured Streaming application from scratch, ingesting sensor data. You'll learn how to deploy a machine learning model and integrate it into your Streaming Application to make real-time predictions in your very own virtual factory.
Session hashtag: SAISStreaming2