Comparing Databricks to Apache Spark

Apache Spark, a powerful open source processing engine built around speed, ease of use, and sophisticated analytics, has become the defacto standard for building big data applications.

However, realizing the value and benefits of Spark on its own can be challenging.

Today’s data scientists, data engineers and developers need to take Spark and cobble together various complex infrastructure, tools and systems to meet their data needs, severely inhibiting their ability to deliver results.

The Databricks Unified Analytics Platform accelerates innovation by unifying data science, engineering, and business. Not only does it run an optimized version of Spark, offering 10-40x performance gains, but it also offers interactive notebooks, integrated workflows, and full enterprise security.

Want to learn more? Visit our platform page.

APIs FOR MULTIPLE USE CASES
A Unified Engine for Big Data Processing and Analytics

SQL AnalyticsYesYes
StreamingYesYes
Machine LearningYesYes
Deep LearningYesYes
Graph AnalysisYesYes

DATABRICKS RUNTIME
Built on Apache Spark and optimized for performance Learn More

Run multiple versions of SparkYesNo
Built-in file system optimized for cloud storage access (AWS S3, Azure Blob Storage)YesNo
Serverless pools offering auto-configuration of resources for SQL and Python workloadsYesNo
Spark-native fine grained resource sharing for optimum utilizationYesNo
Fault isolation of compute resourcesYesNo
Data skipping to improve query processing efficiencyYesNo
Faster reads with Parquet formatYesNo
Faster writes to underlying file systemYesNo
Automatic CachingYesNo
Compute optimization during joins and filtersYesNo
Rapid release cyclesYesNo
Auto-scaling computeYesNo
Auto-scaling local storageYesNo
Multi-user cluster sharingYesNo
Automatic migration between spot and on-demand instancesYesNo
Second-level billingYesNo

INTEGRATED WORKSPACE
Interactive Data Science and Collaboration

Interactive notebooks with support for multiple languages (SQL, Python, R and Scala)YesNo
Real-time collaborationYesNo
Notebook revision history and GitHub integrationYesNo
One-click visualizationsYesNo
Publish notebooks as interactive dashboardsYesNo

PRODUCTION JOBS AND WORKFLOWS
Data Pipelines and Workflow Automation

Spark job monitoring alertsYesNo
One-click deployment from notebooks to Spark JobsYesNo
APIs to build workflows in notebooksYesNo
Production streaming with monitoringYesNo

ENTERPRISE SECURITY
End-to-End Data Security and ComplianceLearn More

Access control for notebooks, clusters, jobs, and structured dataYesNo
Audit logsYesNo
SSO with SAML 2.0 supportYesNo
Data encryption (at rest and in motion)YesNo
Compliance (HIPAA, SOC 2 Type 2)YesNo

INTEGRATIONS
Compatible with Common Tools in the Ecosystem

Connect other BI tools via authenticated ODBC/JDBC (Tableau, Looker, etc)YesNo
REST APIYesNo
Data source connectorsYesNo

EXPERT SUPPORT
Unparalled Support by the Leading Committers of Apache Spark

Help and support from the committers who engineer SparkYesNo
SQL supportYesNo