Data Science and ML | Databricks Blog

Page 26

Predicting Geographic Population using Genome Variants and K-Means

May 24, 2016 by Deborah Siegel in Engineering

Spark Summit 2016 will be held in San Francisco on June 6–8. Check out the full agenda and get your ticket This is...

New Content in Databricks Community Edition

April 12, 2016 by Ion Stoica in Engineering

At the Spark Summit New York , we announced Databricks Community Edition (CE) beta. CE is a free version of the Databricks service...

The Unreasonable Effectiveness of Deep Learning on Apache Spark

March 31, 2016 by Miles Yucht and Reynold Xin in Engineering

Update: this post is an April Fools joke. It is not an actual project we're working on. For the past three years, our...

Auto-scaling scikit-learn with Apache Spark

February 8, 2016 by Tim Hunter and Joseph Bradley in Engineering

Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of...

Deep Learning with Apache Spark and TensorFlow

January 24, 2016 by Tim Hunter in Engineering

Neural networks have seen spectacular progress during the last few years and they are now the state of the art in image recognition...

MLlib Highlights in Apache Spark 1.6

January 20, 2016 by Joseph Bradley in Engineering

To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . With the latest release, Apache Spark’s...

Generalized Linear Models in SparkR and R Formula Support in MLlib

October 5, 2015 by Eric Liang in Engineering

To get started with SparkR, download Apache Spark 1.5 or sign up for a 14-day free trial of Databricks today . Apache Spark...

Improved Frequent Pattern Mining in Apache Spark 1.5: Association Rules and Sequential Patterns

September 28, 2015 by Feynman Liang, Jiajin Zhang and Dandan Tu in Engineering

We would like to thank Jiajin Zhang and Dandan Tu from Huawei for contributing to this blog. To get started mining patterns from...

Large Scale Topic Modeling: Improvements to LDA on Apache Spark

September 22, 2015 by Feynman Liang, Yuhao Yang and Joseph Bradley in Engineering

This blog was written by Feynman Liang and Joseph Bradley from Databricks, and Yuhao Yang from Intel. To get started using LDA, download...

New Features in Machine Learning Pipelines in Apache Spark 1.4

July 29, 2015 by Joseph Bradley and Burak Yavuz in Engineering

Apache Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark...