An Update on Project Zen: Improving Apache Spark for Python Users
Apache Spark™ has reached its 10th anniversary with Apache Spark 3.0 which has many significant improvements and new features including but not limited to type hint support in pandas UDF, better error handling in UDFs, and Spark SQL adaptive query execution. It has grown to be one of the most successful open-source projects as the...
Spark + AI Summit Europe is Expanding and Getting a New Name: Data + AI Summit Europe
Back in 2013, we held the first Spark Summit — a gathering of the Apache Spark™ community with leading contributors and production users sharing their wisdom. Since the first event, Spark’s success has accelerated the evolution of data science, data engineering and analytics. As the data community has expanded, we’ve evolved the content and the...
Introducing the Next-Generation Data Science Workspace
At today’s Spark + AI Summit 2020, we unveiled the next generation of the Databricks Data Science Workspace: An open and unified experience for modern data teams. Existing solutions make data teams choose from three bad options. Giving data scientists the freedom to use any open-source tools on their laptops doesn’t provide a clear path...
MLflow Joins the Linux Foundation to Become the Open Standard for Machine Learning Platforms
Watch Spark + AI Summit Keynotes here At today's Spark + AI Summit 2020, we announced that MLflow is becoming a Linux Foundation project. Two years ago, we launched MLflow, an open source machine learning platform to let teams reliably build and productionize ML applications. Since then, we have been humbled and excited by the...
Introducing Apache Spark 3.0
We’re excited to announce that the Apache SparkTM 3.0.0 release is available on Databricks as part of our new Databricks Runtime 7.0. The 3.0.0 release includes over 3,400 patches and is the culmination of tremendous contributions from the open-source community, bringing major advances in Python and SQL capabilities and a focus on ease of use...
Evolving the Databricks brand
Some brands start out as, well, brands. A lot of work goes into the concept and painting the picture before the business is ever launched. Databricks is different. It always has been and always will be an engineering-led company. Databricks’ model for innovation is inspired by the open-source community. This is where our roots run...
What is a Lakehouse?
Over the past few years at Databricks, we've seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and its advantages over previous approaches. Data warehouses have a long history in decision support and business intelligence applications. Since its inception...
Introducing the MLflow Model Registry
Watch the announcement and demo At today’s Spark + AI Summit in Amsterdam, we announced the availability of the MLflow Model Registry, a new component in the MLflow open source ML platform. Since we introduced MLflow at Spark+AI Summit 2018, the project has gained more than 140 contributors and 800,000 monthly downloads on PyPI, making...
Announcing the MLflow 1.1 Release
We’re excited to announce today the release of MLflow 1.1. In this release, we’ve focused on fleshing out the tracking component of MLflow and improving visualization components in the UI. Some of the major features include: Automatic logging from TensorFlow and Keras Parallel coordinate plots in the tracking UI Pandas DataFrame based search API Java...
Announcing the MLflow 1.0 Release
MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments locally (on a laptop) or remotely (in the cloud), package and share models across frameworks, and deploy models virtually anywhere. Today we are excited to announce the release of MLflow 1.0. Since...