Apache Spark 2.0 Preview: Machine Learning Model PersistenceMay 31, 2016 by Joseph Bradley in Engineering Blog Introduction Consider these Machine Learning (ML) use cases: A data scientist produces an ML model and hands it over to an engineering team...
That’s a Wrap! Spark Live Draws Huge Audience in Los AngelesMay 27, 2016 by Wayne Chan in Company Blog As we continue our road show across the United States, there has been one observation that has been true along our first two...
Just-in-Time Data Warehousing on Databricks: Change Data Capture and Schema On ReadMay 26, 2016 by Wayne Chan in Company Blog A few months ago, we held a live webinar — Just-in-Time Data Warehousing on Databricks: Change Data Capture and Schema On Read —...
Genome Sequencing in a NutshellMay 24, 2016 by Deborah Siegel in Engineering Blog This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...
Parallelizing Genome Variant AnalysisMay 24, 2016 by Deborah Siegel in Engineering Blog This is a guest post from Deborah Siegel from the Northwest Genome Center and the University of Washington with Denny Lee from Databricks...
Predicting Geographic Population using Genome Variants and K-MeansMay 24, 2016 by Deborah Siegel in Engineering Blog Spark Summit 2016 will be held in San Francisco on June 6–8. Check out the full agenda and get your ticket This is...
Apache Spark as a Compiler: Joining a Billion Rows per Second on a LaptopMay 23, 2016 by Sameer Agarwal, Davies Liu and Reynold Xin in Engineering Blog When our team at Databricks planned our contributions to the upcoming Apache Spark 2.0 release, we set out with an ambitious goal by...
Spark Live Los Angeles is just around the cornerMay 20, 2016 by Wayne Chan in Company Blog A couple weeks ago we announced Spark Live , an eight-city road show brought to you by Databricks in collaboration with premier sponsor...
Approximate Algorithms in Apache Spark: HyperLogLog and QuantilesMay 19, 2016 by Tim Hunter, Hossein Falaki and Joseph Bradley in Solutions Introduction Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some...
Apache Spark MLlib: From Quick Start to Scikit-LearnMay 18, 2016 by Wayne Chan in Company Blog A few months ago, we held a live webinar – Apache Spark MLlib: From Quick Start to Scikit-Learn – to give a quick...