Spark Summit East 2017: Another Record-Setting Spark Summit
We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the summit—to enjoy the Patriots’ Super-Bowl victory euphoria that suffused the Spark Summit attendees: the snowstorm outside did not dampen the high spirits inside. Day One: Voices of Databricks Speakers What to Expect...
Databricks and Apache Spark 2016 Year in Review
In 2016, Apache Spark released its second major version 2.0 and outgrew our wildest expectations: 4X growth in meetup members reaching 240,000 globally, and 2X growth in code contributors reaching 1000. In addition to contributing to the success of Spark, Databricks also had a phenomenal year. We have rolled out a large number of features...
Databricks Voices From Spark Summit EU 2016 Day 2
Update: The videos of the presentations are now available. Find them below. Spark Summit Keynotes Although the October overcast persisted over Brussels, inside the SQUARE’s convention center attendees lined up, with coffee in one hand and pastry in the other, to hear how other organizations employ Apache Spark for their use cases. Democratizing AI with...
Notebook Workflows: The Easiest Way to Implement Apache Spark Pipelines
[glossary_parse]Today we are excited to announce Notebook Workflows in Databricks. Notebook Workflows is a set of APIs that allow users to chain notebooks together using the standard control structures of the source programming language — Python, Scala, or R — to build production pipelines. This functionality makes Databricks the first and only product to support...
On-demand webinar available: Databricks’ Data Pipeline
Two weeks ago we held a live webinar – Databricks' Data Pipeline: Journey and Lessons Learned – to show how Databricks used Apache Spark to simplify our own log ETL pipeline. The webinar describes an architecture where you can develop your pipeline code in notebooks, create Jobs to productionize your notebooks, and utilize REST APIs...
Apache SparkR On-Demand Webinar and FAQ
Two months ago we held a live webinar – Enabling Exploratory Analysis of Large Data with Apache Spark and R – to demonstrate one of the most important use cases of SparkR: the exploratory analysis of very large data. The webinar shows how Spark’s features and capabilities, such as caching distributed data and integrated SQL...
New eBook Released: Lessons for Large-Scale Machine Learning Deployments on Apache Spark
We are excited to announce that the third eBook in our technical blog book series, Lessons for Large-Scale Machine Learning Deployments on Apache Spark, has been released today! You can download the eBook here. This eBook, the third of a series, picks up where the second book left off on the topic of advanced analytics,...
Edmunds.com Leverages Databricks to Improve Vehicle Data Quality and Customer Experience
We are happy to announce that Edmunds.com has deployed Databricks to simplify the management of their Apache Spark clusters and perform ad-hoc analysis to improve vehicle data integrity and improve the overall customer experience of their website. You can read the press release here. Edmunds.com, a leading car information and shopping network that serves nearly...
Another Record-Setting Spark Summit
The lure of San Francisco is indisputable as is its position as the preeminent high-tech hub. On day one of Spark Summit 2016, the largest community event dedicated to Apache Spark, drew more than 2500+ Spark enthusiasts from 720+ companies. Such a draw is a strong testament to Apache Spark’s open source roots, its fast-growing...
Achieving End-to-end Security for Apache Spark with Databricks
Today we are excited to announce the completion of the first phase of the Databricks Enterprise Security (DBES) framework. We are proud to say that this makes Databricks the first and only company to provide comprehensive enterprise security on top of Apache Spark. Read the press release here. Hundreds of organizations have deployed Databricks to...