Now that Data + AI Summit is officially wrapped, we wanted to spend a minute recapping some of the top news, content, and updates - and what those mean for data teams in Media & Entertainment.
Here’s what we announced:
Security, Governance & Sharing
Introducing Data Cleanrooms for the Lakehouse
We are excited to announce data cleanrooms for the Lakehouse, allowing businesses to easily collaborate with their customers and partners on any cloud in a privacy-safe way. Participants in the data cleanrooms can share and join their existing data and run complex workloads in any language – Python, R, SQL, Java, and Scala – on that data while maintaining data privacy. Data cleanrooms open a broad array of use cases across industries. In the media industry, advertisers and marketers can deliver more targeted ads, with broader reach, better segmentation, and greater ad effectiveness transparency while safeguarding data privacy.
Introducing Databricks Marketplace
Databricks Marketplace is an open marketplace for exchanging data products such as data sets, notebooks, dashboards, and machine learning models. To accelerate insights, data consumers can discover, evaluate, and access more data products from third-party vendors than ever before.
What’s new with Databricks Unity Catalog
With the general availability of Unity Catalog, everything customers love about Unity Catalog – fine-grained access controls, lineage, integrated governance, auditing, ease of confidently sharing data across business units – is now available to every customer on the platform. Easily and confidently share data across business units.
Platform Updates
Delta Lake is going fully open source
Media teams have been asking for more open sourcing of Delta Lake for a long time, which is why we’re so excited to share that we’re open sourcing ALL of Delta with the upcoming Delta Lake 2.0 release, starting with the most requested features from the community. Delta Lake is the fastest, most popular, and most advanced open format table storage format. The remaining features will be gradually open sourced over time in the coming months. This means that features that were available in the past to Databricks customers only will be available to all of the Delta Lake community.
In addition, this change will allow for better collaboration across the industry, increased performance, and access to previously proprietary features like Change Data Feed and Z-Ordering, which help lower costs and drive faster insights. You can read more about optimizing performance with file management here.
Delta Live Tables Announces New Capabilities and Performance Optimizations
Delta Live Tables (DLT) has grown to power production ETL use cases at over 1,000 leading companies – from startups to enterprises – all over the world since its inception. Project Enzyme is a new optimization layer for Delta Live Tables that speeds up ETL processing and enables enterprise capabilities and UX improvements.
Enhanced Autoscaling optimizes cluster utilization by automatically allocating cluster resources based on workload volume, with minimal impact on the data processing latency of your pipelines, reducing usage and cost for customers.
Project Lightspeed: Faster and Simpler Stream Processing With Apache Spark
As media companies shift to direct-to-consumer models and the advertising ecosystem demand real-time insights, streaming data is core to many of the use cases for Media & Entertainment. Project Lightspeed makes streaming data a first-class citizen on the Databricks Lakehouse Platform, helping continue to make Databricks an industry leader in performance and price for streaming data use cases.
This announcement was the first major streaming announcement we've made - although streaming has ALWAYS been a large and successful part of our business for improving performance to achieve higher throughput, lower latency, and lower cost. The announcement includes improving ecosystem support for connectors, enhancing functionality for processing data with new operators and APIs, and simplifying deployment, operations, monitoring, and troubleshooting.
Data Science & Machine Learning
Introducing MLflow Pipelines with MLflow 2.0
MLflow Pipelines enables data scientists to create production-grade ML pipelines that combine modular ML code with software engineering best practices to make model development and deployment fast and scalable. In practice, this means that code for a recommendation engine, or an anomaly detection algorithm, can be swiftly moved from exploration to production without costly rewrites or refactoring.
Serverless Model Endpoints improve upon existing Databricks-hosted model serving by offering horizontal scaling to thousands of QPS, potential cost savings through auto-scaling, and operational metrics for monitoring runtime performance. Ultimately, this means Databricks-hosted models are suitable for production use at scale. With this addition, your data science teams can now spend more time on business use cases and less time on building and managing Kubernetes infrastructure to serve ML models.
Media & Entertainment Industry sessions
And in case you missed it, there were some incredible Media & Entertainment sessions in which teams discussed the business benefits, cost, productivity savings, and advanced analytics they’re now able to realize with Databricks. Here are a few to highlight: