Hosted cloud

Fully managed Spark clusters available in just seconds with a few clicks.
Learn more

Immediate answers

Built in applications help you find answers within minutes of connecting to your data sources.
Learn more

Spark from its creators

An open source engine that combines blazing speed with sophisticated analytics in a single easy-to-use system.
Learn more

Latest blog posts

See all

Samsung SDS uses Spark for prescriptive analytics at large scale

November 21, 2014

This is a guest blog post from our friends at Samsung SDS outlining their Spark use case. Business Challenge Samsung SDS is the business and IT solutions arm of Samsung Group. A global ICT service provider with over 17,000 employees worldwide and 6.7 billion USD in revenues, Samsung SDS tackles the challenges of some of the largest global enterprises in such industries as manufacturing, financial services, health care and retail. In the different areas Samsung is focused on, the ability to make timely decisions that maximize the value to a business becomes critical. Prescriptive analytics methods have been used effectively to support decision making by leveraging probable future outcomes determined by predictive models and suggesting actions that provide maximal business value. One of the main challenges in applying prescriptive analytics in these areas is the need to analyze a... Read more

The Spark Certified Developer program

November 14, 2014

More and more companies are using Apache Spark, and many Spark based pilots are currently deploying in production. In social media, at every big data conference or meetup, people describe new POC, prototypes, and production deployments using Spark. Behind this momentum, a growing need for Spark developers is developing; people who have demonstrated expertise in how to implement best practices for Spark. People who can help the enterprise building increasingly complex and sophisticated solutions on top of their Spark deployments. At Databricks, we get contacted by many enterprises looking for Spark resources to help with their next data-driven initiative. And so beyond our effort to train people on Spark directly or through partners all around the world, we have teamed up with O’Reilly for offering the first industry standard for measuring and validating a developer’s expertise on Spark. Benefits... Read more

Application Spotlight: Bedrock

November 14, 2014

This post is guest authored by our friends at Zaloni, whose Bedrock platform is now “Certified on Spark.” Bedrock’s Managed Data Pipeline now includes Spark It was evident from the all the buzz at the Strata + Hadoop World conference that Spark has now shifted from the early adopter phase to establishing itself as an integral and permanent part of the Hadoop ecosystem. The rapid pace of adoption is impressive! Given the entrance of Spark into the mainstream Hadoop world, we are glad to announce that Bedrock is now officially Certified on Spark. How does Spark enhance Bedrock? Bedrock™ defines a Managed Data Pipeline as consisting of Ingest, Organize, and Prepare stages. Bedrock’s strength lies in the integrated nature of the way data is handled through these stages. ● Ingest: Bring data from various sources into Hadoop ● Organize:... Read more

Spark officially sets a new record in large-scale sorting

November 5, 2014

A month ago, we shared with you our entry to the 2014 Gray Sort competition, a 3rd-party benchmark measuring how fast a system can sort 100 TB of data (1 trillion records). Today, we are happy to announce that our entry has been reviewed by the benchmark committee and we have officially won the Daytona GraySort contest! In case you missed our earlier blog post, using Spark on 206 EC2 machines, we sorted 100 TB of data on disk in 23 minutes. In comparison, the previous world record set by Hadoop MapReduce used 2100 machines and took 72 minutes. This means that Spark sorted the same data 3X faster using 10X fewer machines. All the sorting took place on disk (HDFS), without using Spark’s in-memory cache. This entry tied with a UCSD research team building high performance systems and... Read more
See all blog posts