R Tyler Croy is the Director of Platform Engineering at Scribd, where he leads the efforts to empower data customers across the organization with higher quality and fresher data than had been previously possible. His background is in production data services, revolving largely around Apache Kafka and various stream processing tools. At Scribd, Tyler and his team work to bring data-driven insights closer to production applications with the “Real-time Data Platform”, built on Apache Kafka, Apache Spark, and Delta Lake.
The modern data customer wants data now. Batch workloads are not going anywhere, but at Scribd the future of our data platform requires more and more streaming data sets. As such our new data platform built around AWS, Delta Lake, and Databricks must simultaneously support hundreds of batch workloads, in addition to dozens of new data streams, stream processing, and stream/ad-hoc workloads. In this session we will share the progress of our transition into a streaming cloud-based data platform, and how some key technology decisions like adopting Delta Lake have unlocked previously unknown capabilities our internal customers enjoy. In the process, we'll share some of the pitfalls and caveats from what we have learned along the way, which will help your organization adopt more data streams in the future.