Apache Spark is an excellent tool to accelerate your analytics, whether you’re doing ETL, Machine Learning, or Data Warehousing. However, to really make the most of Spark it pays to understand best practices for data storage, file formats, and query optimization. This talk will cover best practices I’ve applied over years in the field helping customers write Spark applications as well as identifying what patterns make sense for your use case.
Session hashtag: #EUdev5
Silvio is a Resident Solutions Architect with Databricks. He joined the company in May, 2016 but has been using Spark since it's early days back in v0.6. He's delivered multiple Spark training courses and spoken at several meetups in the Washington, DC area. He's worked with customers in the financial industry, digital marketing, and cyber security all using Apache Spark. In addition to Spark development, Silvio also has a background in application security and forensics