Daniel Tomes

Lead Resident Solutions Architect, Databricks

Daniel Tomes leads the Resident Solutions Architect Practice at Databricks and is responsible for vertical integration, productization and strategic client growth. His big data journey began in 2014 at a major oil and gas company after which he moved to Cloudera for two years as a Solutions Architect and in 2017 join Databricks.

Past sessions

Summit 2020 AutoML Toolkit – Deep Dive

June 23, 2020 05:00 PM PT

Tired of doing the same ole feature engineering tasks or tuning your models over and over? Come watch how Databricks Labs is solving this. We will explore how this toolkit automates and accelerates: Feature Engineering/Culling Feature Importances Selection Model Selection & Tuning Model Serving/Deployment Model Documentation (MLflow) Inference & Scoring

Summit Europe 2019 Apache Spark Core – Practical Optimization —continues

October 16, 2019 05:00 PM PT

Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.

Summit Europe 2019 Apache Spark Core – Practical Optimization

October 16, 2019 05:00 PM PT

Properly shaping partitions and your jobs to enable powerful optimizations, eliminate skew and maximize cluster utilization. We will explore various Spark Partition shaping methods along with several optimization strategies including join optimizations, aggregate optimizations, salting and multi-dimensional parallelism.

Summit Europe 2019 Building A Feature Factory

October 15, 2019 05:00 PM PT

Building, managing, and maintaining thousands of features across thousands of models. Building features can be repetitive, tedious and extremely challenging to scale. We will explore the 'Feature Factory' built at Databricks and implemented at several clients and the processes that are imperative for the democratization of feature development and deployment. The feature factory allows consumers to ensure repetitive feature creation, simplifies scoring and enables massive scalability through feature multiplication.

Summit 2019 Apache Spark Core—Deep Dive—Proper Optimization

April 24, 2019 05:00 PM PT

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?