Defining the Future of Data & AI: Announcing the Finalists for the 2022 Databricks Data Team OSS Award

The annual Databricks Data Team Awards recognize data teams who are harnessing the power of data and AI to deliver solutions for some of the world’s toughest problems.

Nearly 250 teams were nominated across six categories from all industries, regions, and companies – all with impressive stories about the work they are doing with data and AI. As we lead up to Data and AI Summit, we will be showcasing the finalists in each of the categories over the coming days.

First up: The Data Team OSS Award celebrates those who are making the most out of leveraging, or contributing to, the open-source technologies that are defining the future of data and AI, including Delta Lake, MLflow, and Apache Spark™.

Meet the five finalists for the Data Team OSS Award category:

Apple
As one of the most iconic and recognizable brands in the world with over 1 billion iPhone users worldwide, it’s obvious that data and AI are at the forefront of its innovation strategy. Part of what has contributed to the loyalty of Apple customers is peace of mind — knowing that their devices and data are secure from malicious attacks. Apple’s early commitment to Delta Lake has allowed them to build a foundation to operate massive streaming, SQL, graph, and ML workloads, ingesting 100s of terabytes of daily log and telemetry data required to detect, diagnose and respond to cyber threats in real-time. Apple continues to be instrumental in the building of Delta Lake since its inception, contributing code and key design elements, while actively participating in the various Delta Lake community forums to help other organizations looking to democratize data and AI with Delta Lake.

Back Market
The refurbished consumer electronics market is growing significantly as an alternative to purchasing new devices. With over 6 million customers, Back Market is the leading dedicated renewed tech marketplace bringing high-quality professionally refurbished electronic devices and appliances, including smartphones, laptops, gaming consoles, and more. The key to ensuring they meet the needs of each customer and seller is data, but as analytical workloads rose, so did the need to consume their data in a rapid, efficient and secure manner. To enable this, Florian Valeye (a Delta Lake Committer) and the engineering team at Back Market have been important contributors to the creation of the Delta Rust API and associated Python bindings, in an effort to enable low-latency queries of delta table without having to spin up a Spark cluster. They’ve also been actively involved in Delta Lake community office hours, contributors to the AWS Labs Athena Federation, reviewing code, and even our partnerships with Google BigQuery.

Samba TV
As more people consume content across internet-connected TVs, the data captured is enabling unparalleled levels of audience targeting and personalized experiences. Samba TV provides first-party data from tens of millions of televisions, across more than 20 TV brands sold in over 100 countries, providing advertisers and media companies a unified view of the entire consumer journey. Due to S3’s lack of putIfAbsent transactional consistency thus requiring single cluster writes support, the team at Samba TV became an instrumental driver for S3 Multi-cluster writes which allows writes to S3 across multiple clusters and/or Spark drivers while ensuring that only one writer succeeds with each transaction. This maintains atomicity and prevents file contents from ever being overwritten. Their contributions don’t stop there as Samba TV continues to actively participate in the Delta Lake community, forums, and feature discussions.

Scribd
Scribd is on a mission to change the way the world reads. Scribd offers a monthly subscription, providing online access to the best ebooks, audiobooks, magazines, and podcasts for over a million consumers and 100 million monthly visitors. With the world’s largest library of digital content spanning more than 60 million titles, Scribd relies on the lakehouse architecture built on Delta Lake, to build performance-optimized data pipelines that easily support both historical and streaming data to power recommendations that serve compelling and interesting content to users. As invaluable contributors to the Delta Lake community, the team at Scribd has leveraged their deep understanding of the Delta Lake ecosystem to create the Delta Rust API, kafka-delta-ingest, sql-import, and has provided an immense amount of feedback on Apache Spark, Delta Lake, MLflow, machine learning, Databricks, and more. They have also helped us with community office hours, reviewing Delta code, and working with the Delta community.

T-Mobile
T-Mobile’s mission is to build the nation’s best 5G network while reducing customer pain points every day. To meet the Un-carrier’s aggressive build plans and customer-focused goals, they embarked on a digital transformation — relying on their data to optimize back-office business processes, streamline network builds, mitigate fraud, and improve the overall experience for the enterprise’s business teams. At the heart of their data strategy is the lakehouse architecture and Delta Lake — democratizing access to data for BI and ML workloads at the speed of business. As valuable members of the Delta Lake community, they have been pushing the boundaries of Delta Lake to solve their toughest data problems by optimizing their procurement and supply chain process, ensuring billions of dollars of cell-site equipment is at the right place at the right time, to streamlining internal initiatives that better engage customers, save money and drive revenue.

Check out the award finalists in the other categories and come raise a glass and celebrate these amazing data teams during the awards ceremony at the Data and AI Summit on June 29.

Try Databricks for free Get started

Sign up