Media and Entertainment sessions

We are delighted to bring you the best Media and Entertainment sessions from Spark + AI Summit 2020, the virtual event for data teams. Watch the selected sessions and relive the Media and Entertainment forums below.

Catch up on some of the outstanding keynotes featuring luminaries like Matei Zaharia, Adam Pazske, Reynold Xin, and more.

To explore all Spark + AI Summit session recordings, click here.

Advertising Fraud Detection at Scale at T-Mobile

Eric Yatskowitz, Data Scientist, T-Mobile
Phan Chuong, Data Engineer, T-Mobile

The development of big data products and solutions – at scale – brings many challenges to the teams of platform architects, data scientists, and data engineers. While it is easy to find ourselves working in silos, successful organizations intensively collaborate across disciplines such that problems can be understood, a proposed model and solution can be scaled and optimized on multi-terabytes of data. In this session, the T-Mobile Marketing Solutions (TMS) Data Science team will present a platform architecture and production framework supporting TMS internal products and services. Powered by Apache Spark technologies, these services operate in a hybrid of on-premises and cloud environments. As a showcase example, we will discuss key lessons learned and best practices from our Advertising Fraud Detection service. An important focus is on how we scaled data science algorithms outside of the Spark MLlib framework. We will also demonstrate various Spark optimization tips to improve product performance and utilization of MLflow for tracking and reporting. We hope to show the best practices we’ve learned from our journey of building end-to-end Big Data products.

Deliver Dynamic Customer Journey Orchestration at Scale

Krish Kuruppath, SVP, Global Head of AI Platform, Publicis Media-COSMOS
Sharad Varshney, VP, Head of Data Science, Publicis Media-COSMOS

As the customer acquisition costs are rising steadily, organizations are looking into ways to optimize their end-to-end customer experience in order to convert prospects into customers quickly and to retain them for a longer period of time. In today’s omnichannel environment where non-linear events and micro-moments’ drive the customer engagement with brands, the traditional one-size-fits-all customer journey will not be able to deliver true value to the customer and to the organization.

COSMOS customer intelligence platform helps organizations to address this challenge by offering a set of comprehensive and scalable Marketing Machine Learning (MML) Models for recommending the ‘next-best-action’ based on the customer journey. Trained on one of the largest customer datasets available in the United States, COSMOS MML Models leverage Spark, Databricks, and Delta Lake to stitch and analyze profile-based, behavioral, transactional, financial, and operational data to deliver customer journey orchestration at scale. In this session, we will discuss the business benefits of the dynamic customer journey orchestration, limitations of the classic customer journey models, and demonstrate how COSMOS MML models overcome these limitations. We will also review the global customer journey decision system that is built on top of ensemble machine learning techniques, leveraging Customer Lifetime Value (CLV) as the foundation.

Data Driven Decisions at Scale at Comcast

Jim Forsythe, Product Analytics & Behavior Science Team Lead, Comcast

Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers, and continues to grow its presence in the EU with the acquisition of Sky. Over the last couple years, Comcast has shifted focus to the customer experience. For example, Comcast has rolled out our Flex device which allows for customers to stream content directly to their TVs without needing an additional cable subscription. With the shift in focus to customer experience, Comcast has made a concerted effort to continue to make data driven decisions to understand how customers interact with our products while continuing to innovate with new products and subscriptions. The Product Analytics & behavior science (PABS) team plays a crucial role as an interpreter, transforming data into consumable insights and providing these insights to the broader product teams within Comcast. The PABS team does this on the entire Product ecosystem including X1, XFi and their brand new Flex devices, which is one of the largest streaming platforms in the world and this ecosystem is responsible for generating data at a rate of more than 25TBs per day with over 3PBs of data being used for consumable insights. In order for the PABS team to be able to continue to drive consumable insights on massive data sets while still being able to control the amount of data being stored, the PABS team have been using Databricks and Databricks Delta Lake to do high current low latency read/writes in order to build reliable real-time data pipelines to deliver insights and also be able to do efficient deletes in a timely manner. Some of the features from delta that we took advantage of to achieve the desired levels of efficiencies, optimization and cost savings are:

  • Distributed writes to s3 (essentially eliminating 500 errors)
  • Optimize

Scaling Production Machine Learning Pipelines with Databricks

Max Cantor, Programmer, Conde Nast
James Evers, Software Engineer, Conde Nast

Conde Nast is a global leader in the media production space housing iconic brands such as The New Yorker, Wired, Vanity Fair, and Epicurious, among many others. Along with our content production, Conde Nast invests heavily in companion products to improve and enhance our audience’s experience. One such product solution is Spire, Conde Nast’s service for user segmentation, and targeted advertising for over a hundred million users. Spire consists of thousands of models, many of which require individual scheduling and optimization. From data preparation to model training to interference, we’ve built abstractions around the data flow, monitoring, orchestration, and other internal operations. In this talk, we explore the complexities of building large scale machine learning pipelines within Spire and discuss some of the solutions we’ve discovered using Databricks, MLflow, and Apache Spark. The key focus is on production-grade engineering patterns, the inner workings the required components, and the lessons learned throughout their development.

Media and Entertainment Forum


Dan Morris | Stephen Layland | Eric Wasserman | Kevin Perko | Steve Sobel | Kevin Davis

Join us for an interactive Media and Entertainment Industry Forum at Spark + AI Summit. In this free virtual event, you will have the opportunity to network with your peers and participate in engaging panel discussions with leaders in the Media industry on how data and machine learning are driving innovation across the customer lifecycle.