Artificial Intelligence (AI) has massive potential to drive disruptive innovations affecting most enterprises on the planet. However, most enterprises are struggling to succeed with AI . Why is that? Simply put, AI and Data are siloed in different systems and different organizations. Enterprise data is siloed across hundreds of systems such as data warehouses, data lakes, databases and file systems that are not AI-enabled. Popular machine learning frameworks such as TensorFlow, PyTorch, and SciKit-Learn don’t do data processing. Because these data systems don’t “do AI” and these AI technologies don’t “do data”, it’s extremely hard for enterprises to succeed with AI, which after all requires both ingredients to be successful.
Adding to this data and AI technology divide, companies have siloed teams for data and AI. Data engineers deal with large scale data processing and production deployment, whereas data scientists deal with AI - exploring data, training and validating models. Given the highly iterative nature of AI development, this organizational separation creates significant friction and slows AI initiatives considerably.
Unified Analytics is a new category of solutions that unifies data science and engineering, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives. Unified Analytics brings the disparate worlds of data science and engineering together with a common platform— making it easier for data engineers to build data pipelines across siloed systems and prepare labelled datasets for model building while enabling data scientists to explore and visualize data and build models collaboratively. Unified Analytics provides one engine to prepare high quality data at massive scale and iteratively train machine learning models on the same data. Unified Analytics also provides collaboration capabilities for data scientists and data engineers to work effectively across the entire AI lifecycle. Organizations that succeed in unifying their data at scale and unifying that data with the best AI technologies will have significantly higher chance of success with AI.
Databricks Unified Analytics Platform, powered by Apache SparkTM, lowers the barrier for enterprises to innovate with AI and accelerates their innovation . It has three major components:
At Spark + AI Summit in San Francisco, we proudly announced the following innovations to further unify data science and engineering to accelerate AI initiatives -
Databricks Delta, now part of the Databricks Runtime, works with Apache Spark to make data reliable and ready for ML through transactional integrity on batch and streaming data. Delta also improves performance up to 100x through caching and indexing capabilities. With Delta, data engineers can ensure that data scientists have early access to new reliable datasets at scale. Sign up for Databricks Delta here.
Based on customers demand, we are very excited to announce the new native and deep integration of popular ML libraries including xgboost, scikit-learn, and numpy as well as popular deep learning framework such as TensorFlow, Keras, Horovod. This will dramatically simplify configuration and setup allowing data scientists to focus on their data and model building.
MLflow is the first open source, cross-cloud framework that dramatically simplifies the workflow associated with building, deploying and iterating on machine learning. With MLflow, organizations can package their code for reproducible runs, execute and compare hundreds of parallel experiments, leverage any of the existing open source projects, and deploy models to production on a variety of serving platforms. You can learn more about MLflow at Matei’s blog and download MLflow here. For future updates on MLflow and Databricks integration, sign-up here.
Databricks Unified Analytics Platform
Our goal is to accelerate innovation by unifying data science, engineering and business. We will continue to drive data science and engineering unification by adding new capabilities to the Databricks Unified Analytics Platform. Get started today with Databricks Unified Analytics Platform.