Sean is a data scientist at Databricks. He is an Apache Spark committer and PMC member, and co-author Advanced Analytics with Spark. Previously, he was director of Data Science at Cloudera and an engineer at Google.
Careful with that modeling tool! Even the simplest data analysis problems can have surprising statistical subtleties, which can lead the aspiring data scientist to the wrong conclusions from data. This talk will examine three straightforward scenarios where many answers seem correct. It will examine how the notion of causality helps resolve all of them, and briefly explore the power of graphical models and Judea Pearl's do-calculus. By the end of this session, you will be more cautious and careful with the modeling tool, and learn that correlation is not always causation. Session hashtag: #SAISDS3