Events - Databricks



London PyTorch Meet-up | London, UK


London, UK

Agenda: 7:00 - "Grounding Semantics in Visual Scenes: Implications for Sky" by Aji Ghose - Research Fellow at Centre for Computation, Cognition and Modelling at Birkbeck and Vice President of Data & Research at Chattermill 7:20 - "Cancer diagnosis in the age of artificial intelligence: past, present and the future" by Pandu Raharja-Liu from 7:40 - "From Active Learning to Domain Adaptation in Deep Networks - A Story of My Research Evolution", Shrinivasan Sankar from University of Oxford 8:00 - Networking over drinks

Making Apache Spark Better With Delta Lake


Laurel,. MD

Managing data lakes, which are are data repositories that store large and varied sets of raw data in its native format, can be challenging. Join us in February to learn about Delta Lake, an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

Advertising Fraud Detection at Scale at T-Mobile


Bellevue, WA

The development of big data products and solutions - at scale - brings many challenges to the teams of platform architects, data scientists, and data engineers. While it is easy to find ourselves working in silos, successful organizations intensively collaborate across disciplines such that problems can be understood, a proposed model and solution can be scaled and optimized on multi-terabytes of data. In this session, the T-Mobile Marketing Solutions (TMS) Data Science team will present a platform architecture and production framework supporting TMS internal products and services. Powered by Apache Spark™ technologies, these services operate in a hybrid of on-premises and cloud environments. As a showcase example, we will discuss key lessons learned and best practices from our Advertising Fraud Detection service. An important focus is on how we scaled data science algorithms outside of the Spark MLlib framework. We will also demonstrate various Spark optimization tips to improve product performance and utilization of MLflow for tracking and reporting. We hope to show the best practices we’ve learned from our journey of building end-to-end Big Data products.