Lessons from the Field, Episode II: Applying Best Practices to Your Apache Spark Applications

Download Slides
Apache Spark is an excellent tool to accelerate your analytics, whether you’re doing ETL, Machine Learning, or Data Warehousing. However, to really make the most of Spark it pays to understand best practices for data storage, file formats, and query optimization.
As a follow-up of last year’s “Lessons From The Field”, this session will review some common anti-patterns I’ve seen in the field that could introduce performance or stability issues to your Spark jobs. We’ll look at ways of better understanding your Spark jobs and identifying solutions to these anti-patterns to help you write better performing and more stable applications.
Session hashtag: #SAISExp9

« back
About Silvio Fiorito

Silvio is a Resident Solutions Architect with Databricks. He joined the company in May, 2016 but has been using Spark since it's early days back in v0.6. He's delivered multiple Spark training courses and spoken at several meetups in the Washington, DC area. He's worked with customers in the financial industry, digital marketing, and cyber security all using Apache Spark. In addition to Spark development, Silvio also has a background in application security and forensics