Daniel Tomes - Databricks

Daniel Tomes

Resident Solutions Architect, Databricks

Daniel Tomes leads the Resident Solutions Architect Practice at Databricks and is responsible for vertical integration, productization and strategic client growth. His big data journey began in 2014 at a major oil and gas company after which he moved to Cloudera for two years as a Solutions Architect and in 2017 join Databricks.

UPCOMING SESSIONS

PAST SESSIONS

Apache Spark Core – Practical Optimization —continuesSummit Europe 2019

Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.

Apache Spark Core – Practical OptimizationSummit Europe 2019

Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.

Building A Feature FactorySummit Europe 2019

Building, managing, and maintaining thousands of features across thousands of models. Building features can be repetitive, tedious and extremely challenging to scale. We will explore the 'Feature Factory' built at Databricks and implemented at several clients and the processes that are imperative for the democratization of feature development and deployment. The feature factory allows consumers to ensure repetitive feature creation, simplifies scoring and enables massive scalability through feature multiplication.

Apache Spark Core—Deep Dive—Proper OptimizationSummit 2019

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?

Apache Spark Core—Deep Dive—Proper Optimization (continues)Summit 2019

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?