How McAfee Built High-Quality Pipelines with Azure Databricks to Power Customer Insights on 250TB+ of Data: Lessons Learned in Data Governance and Lineage

Download Slides

How do you make over 250TB of data useful to data scientists? Bring a lot of CPUs? If only it were so simple!

Understanding customer behavior requires having high quality, reliable data. Any data quality challenges are magnified with high volumes of data, limiting data scientists’ ability to understand, clean, and use data. If you have to clean up 600 million events per day, it’s like cleaning Moscone West’s floors: as soon as you’re done, you have to start all over again.

Come learn how McAfee built a data-driven pipeline using Azure Databricks to maintain high data quality and comprehensive lineage to enable data scientists to be more productive and make sound statistical inferences.


Try Databricks
See More Spark + AI Summit in San Francisco 2019 Videos

« back
About David Newell

David is the product manager for McAfee’s consumer data platform and manages McAfee’s consumer product analytics team. David’s data experience includes airline revenue management, pricing strategy, construction, environmental compliance, and cybersecurity. His passions include visiting world heritage sites, commercial aviation, and wine. David holds a BS in Management Science and Engineering from Stanford University.

About Geoff Oitment

Geoff leads the Analytics, Insights & Data Analysis (AIDA) group for the Consumer Business Unit at McAfee where he drives analytics strategy to develop a data-driven decision-making culture. In this role, Geoff and his team enforce best practices for data governance, build data pipelines, extract actionable insights from billions of records of data, and research the emerging data technologies. Prior to this role, Geoff led the Alternative Monetization and Web Security teams at McAfee.