Skip to main content
<
Page 27
>

Statistical and Mathematical Functions with DataFrames in Apache Spark

We introduced DataFrames in Apache Spark 1.3 to make Apache Spark much easier to use. Inspired by data frames in R and Python...

New MLlib Algorithms in Apache Spark 1.3: FP-Growth and Power Iteration Clustering

This is a guest blog post from Huawei’s big data global team. Huawei, a Fortune Global 500 private company, has put together a...

Running Apache Spark GraphX algorithms on Library of Congress subject heading SKOS

April 14, 2015 by Bob DuCharme in
This is a guest post from Bob DuCharme. Original article appeared in: http://www.snee.com/bobdc.blog/2015/04/running-spark-graphx-algorithm.html Well, one algorithm, but a very cool one. Last month...

Topic modeling with LDA: MLlib meets GraphX

March 25, 2015 by Joseph Bradley in
Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or...

Announcing Apache Spark 1.3!

March 13, 2015 by Patrick Wendell in
Today I’m excited to announce the general availability of Apache Spark 1.3! Apache Spark 1.3 introduces the widely anticipated DataFrame API, an evolution...

Random Forests and Boosting in MLlib

January 21, 2015 by Joseph Bradley and Manish Amde in
This is a post written together with Manish Amde from Origami Logic. Apache Spark 1.2 introduces Random Forests and Gradient-Boosted Trees (GBTs) into...

ML Pipelines: A New High-Level API for MLlib

MLlib’s goal is to make practical machine learning (ML) scalable and easy. Besides new algorithms and performance improvements that we have seen in...

Efficient Similarity Algorithm Now in Apache Spark, Thanks to Twitter

October 20, 2014 by Reza Zadeh in
Our friends at Twitter have contributed to MLlib, and this post uses material from Twitter’s description of its open-source contribution , with permission...

Scalable Decision Trees in MLlib

September 29, 2014 by Manish Amde and Joseph Bradley in
This is a post written together with one of our friends at Origami Logic. Origami Logic provides a Marketing Intelligence Platform that uses...

Apache Spark 1.1: MLlib Performance Improvements

September 22, 2014 by Burak Yavuz in
With an ever-growing community, Apache Spark has had it’s 1.1 release . MLlib has had its fair share of contributions and now supports...