Hosted cloud

Fully managed Spark clusters available in just seconds with a few clicks.
Learn more

Immediate answers

Built in applications help you find answers within minutes of connecting to your data sources.
Learn more

Spark from its creators

An open source engine that combines blazing speed with sophisticated analytics in a single easy-to-use system.
Learn more

Latest blog posts

See all

Announcing Spark 1.2

December 19, 2014

We at Databricks are thrilled to announce the release of Spark 1.2! Spark 1.2 introduces many new features along with scalability, usability and performance improvements. This post will introduce some key features of Spark 1.2 and provide context on the priorities of Spark for this and the next release. In the next two weeks, we’ll be publishing blog posts with more details on feature additions in each of the major components. Spark 1.2 has been posted today on the Apache Spark website. Optimizations in Spark’s core engine Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management....

Pearson uses Spark Streaming for next generation adaptive learning platform

December 8, 2014

This is a guest blog post from our friends at Pearson outlining their Spark use case. Introduction of Pearson Pearson is a British multinational publishing and education company headquartered in London. It is the largest education company and the largest book publisher in the world. Recently, Pearson announced a new organization structure in order to accelerate their push into digital learning, education services and emerging markets. I am part of Pearson Higher Education group, which provides textbooks and digital technologies to teachers and students across Higher Education. Pearson’s higher education brands include eCollege, Mastering/MyLabs and Financial Times Publishing. What we wanted to do We are building a next generation adaptive learning platform which delivers immersive learning experiences designed for the way today’s students read, think, and learn. This learning platform is a scalable, reliable, cloud-based platform providing services to...

Application Spotlight: Technicolor Virdata Internet of Things platform

December 3, 2014

This post is guest authored by our friends at Technicolor, whose Virdata platform is now “Certified on Spark.” About Virdata Virdata is Technicolor’s cloud-native Internet of Things platform offering real-time monitoring, configuration and management of the unprecedented number of connected devices and applications. Combining its highly-scalable data ingestion and messaging capabilities with real-time and historical analytics, Virdata brings value across multiple data-driven markets. The Virdata platform was launched at CES Las Vegas in January, 2014. The Virdata cloud-based platform architecture integrates state-of-the-art open source software components into a homogeneous, high-availability data-processing environment. Virdata and Spark The Virdata solution architecture comprises 3 areas: Messaging, Data Processing and Applications – all accessed through APIs. Its publish/subscribe based messaging infrastructure contains a high-throughput distributed message broker and distributed complex event processing and bidirectional message routing components. Completing Virdata’s “full stack” Internet of...

Application Spotlight: Nube Reifier

December 2, 2014

This post is guest authored by our friends at Nube Technologies, whose Reifier platform is now “Certified on Spark.” About Nube Technologies Nube Technologies builds business applications to better decision making through better data. Nube’s fuzzy matching product Reifier helps companies get a holistic view of enterprise data. By linking and resolving entities across various sources, Reifier helps optimize the sales and marketing funnel, promotes enhanced security and risk management and better consolidation and reporting of business data. We help our customers build better and effective models by ensuring that their underlying master data is accurate. Why Spark Data matching within a single source or across sources is a very core problem faced by almost every enterprise and we wanted to create a really smart way to solve this. Solving data matching problems is made even more difficult given...
See all blog posts