Skip to main content
<
Page 3

MLlib Highlights in Apache Spark 1.6

January 20, 2016 by Joseph Bradley in
To learn more about Apache Spark, attend Spark Summit East in New York in Feb 2016 . With the latest release, Apache Spark’s...

Visualizing Machine Learning Models

To try the new visualization features mentioned in this blog, sign up for a 14-day free trial of Databricks today. You've built your...

Large Scale Topic Modeling: Improvements to LDA on Apache Spark

This blog was written by Feynman Liang and Joseph Bradley from Databricks, and Yuhao Yang from Intel. To get started using LDA, download...

New Features in Machine Learning Pipelines in Apache Spark 1.4

Apache Spark 1.2 introduced Machine Learning (ML) Pipelines to facilitate the creation, tuning, and inspection of practical ML workflows. Spark’s latest release, Spark...

Topic modeling with LDA: MLlib meets GraphX

March 25, 2015 by Joseph Bradley in
Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or...

Random Forests and Boosting in MLlib

January 21, 2015 by Joseph Bradley and Manish Amde in
This is a post written together with Manish Amde from Origami Logic. Apache Spark 1.2 introduces Random Forests and Gradient-Boosted Trees (GBTs) into...

ML Pipelines: A New High-Level API for MLlib

MLlib’s goal is to make practical machine learning (ML) scalable and easy. Besides new algorithms and performance improvements that we have seen in...

Scalable Decision Trees in MLlib

September 29, 2014 by Manish Amde and Joseph Bradley in
This is a post written together with one of our friends at Origami Logic. Origami Logic provides a Marketing Intelligence Platform that uses...