Reliability and performance for data lakes
Delta Lake brings data reliability and scalability to your existing data lake, with an open source transactional storage layer designed for the full data lifecycle. Learn more about Delta Lake.
Simple data processing on auto-scaling infrastructure. Powered by highly optimized Apache Spark™ for up to 50x performance gains.
Learn more about Apache Spark.
Collaboration across the full data science and machine learning lifecycle
Quickly access and explore data, find and share new insights, and build models collaboratively, with languages and tools of choice.
Learn more about Notebooks.
One click access to preconfigured ML environments for augmented machine learning with state of the art and popular ML frameworks.
Learn more about ML Runtime.
Track and share experiments, reproduce runs, and manage models collaboratively from a central repository, from experimentation to production. Learn more about MLflow.
More complete and recent data to drive insights for every team
Run SQL workloads directly on your data lake to query and analyze your freshest data with up to 9x better price/performance than traditional cloud data warehouses.
Quickly and easily visualize query results and organize visualizations into rich dashboards to share live insights with your team with automatic alerts for critical changes.
Use your preferred BI tools, like Tableau and Microsoft Power BI, with optimized connectors that provide fast performance, low latency, and high user concurrency to your data lake.
A massively secure and scalable multi-cloud platform running millions of machines every day
Give all your users the right access to the right data with comprehensive audit trails by using your existing cloud security policies and identity management system to create compliant, private, and isolated workspaces. Learn more about Platform Security.
Quickly spin up and down collaborative workspaces for any project while being equipped with the right tools to manage user access, control spend, audit usage, and analyze activity across every workspace, all while seamlessly enforcing user and data governance.
Learn more about 360° Administration.
Use fully-configured data environments and API’s to quickly take initiatives from development to production. Once in production, data teams can use on-demand autoscaling to optimize performance and reduce down time of data pipelines and ML models in production by efficiently matching resources to demand. Learn more about Elastic Scalability.
Securely integrate a single platform into each cloud to enable your data teams to do data analytics and machine learning without asking your users to learn cloud-specific tools and processes. Learn more about Databricks for Microsoft Azure and Amazon Web Services.
In this talk, Jim Forsythe and Jan Neumann describe Comcast’s data and machine learning infrastructure built on Databricks Unified Data Analytics Platform. Comcast uses Databricks to train and fuel the machine learning models at the heart of these products and gain deeper insights into how its users use these products.