Comparing Apache SparkTM and Databricks


Apache Spark capabilities provide speed, ease of use and breadth of use benefits and include APIs supporting a range of use cases:

  • Data integration and ETL
  • Interactive analytics
  • Machine learning and advanced analytics
  • Real-time data processing

v


Databricks builds on top of Spark and adds:

  • Highly reliable and performant data pipelines
  • Productive data science at scale

Want to learn more? Visit our platform page.

Feature Comparison

Databricks

Learn More

Yes No
Run multiple versions of Spark Yes No
Built-in file system optimized for cloud storage access (AWS S3, Redshift, Azure Blob) Yes No
Serverless pools offering auto-confiruation of resources for SQL and Python workloads Yes No
Spark-native fine grained resource sharing for optimum utilization Yes No
Fault isolation of compute resources Yes No
Faster writes to S3 Yes No
Compute optimization during joins and filters Yes No
Rapid release cycles Yes No
Auto-scaling compute Yes No
Auto-scaling local storage Yes No
High availability for cluster Yes No
Multi-user cluster sharing Yes No
Automatic migration between spot and on-demand instances Yes No
Second-level billing Yes No

Yes No
ACID transactions Yes No
Schema management Yes No
Batch/Stream read/write support Yes No
SSD caching Yes No
Indexing Yes No
Table snapshotting Yes No

Yes No
Interactive notebooks with support for multiple languags (SQL, Python, R and Scala) Yes No
Real-time collaboration Yes No
Notebook revision history and GitHub integration Yes No
One-click visualizations Yes No
Publish notebooks as interactive dashboards Yes No

Yes No
Spark job monitoring alerts Yes No
One-click deployment from notebooks to Spark Jobs Yes No
APIs to build workflows in notebooks Yes No
Production streaming with monitoring Yes No

Learn More

Yes No
Access control for notebooks, clusters, jobs, and structured data Yes No
Audit logs Yes No
SSO with SAML 2.0 support Yes No
Data encryption (at rest and in motion) Yes No
Compliance (HIPAA, SOC 2 Type 2) Yes No

Yes No
Connect other BI tools via authenticated ODBC/JDBC (Tableau, Looker, etc) Yes No
REST API Yes No
Data source connectors Yes No

Yes No
Help and support from the committers who engineer Spark Yes No
SQL support Yes No