Mining Ecommerce Graph Data with Apache Spark at Alibaba Taobao
This is a guest blog post from our friends at Alibaba Taobao. Alibaba Taobao operates one of the world’s largest e-commerce platforms. We collect hundreds of petabytes of data on this platform and use Apache Spark to analyze these enormous amounts of data. Alibaba Taobao probably runs some of the largest Spark jobs in the world. For example, some Spark jobs run for weeks to perform feature extraction on petabytes of image data. In this blog post, we share our
Scalable Collaborative Filtering with Apache Spark MLlib
Recommendation systems are among the most popular applications of machine learning. The idea is to predict whether a customer would like a certain item: a product, a movie, or a song. Scale is a key concern for recommendation systems, since computational complexity increases with the size of a company's customer base. In this blog post, we discuss how Apache Spark MLlib enables building recommendation models from billions of records in just a few lines of Pyt
Distributing the Singular Value Decomposition with Apache Spark
Guest post by Li Pu from Twitter and Reza Zadeh from Databricks on their recent contribution to Apache Spark's machine learning library. The...
The State of Apache Spark in 2014
This post originally appeared in insideBIGDATA and is reposted here with permission. With the second Spark Summit behind us, we wanted to take...
New Features in MLlib in Apache Spark 1.0
MLlib is an Apache Spark component focusing on machine learning. It became a standard component of Spark in version 0.8 (Sep 2013). The...
Shark, Spark SQL, Hive on Spark, and the future of SQL on Apache Spark
With the introduction of Spark SQL and the new Hive on Apache Spark effort ( HIVE-7292 ), we get asked a lot about...
Exciting Performance Improvements on the Horizon for Spark SQL
Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Announcing Apache Spark 1.0
Today, we’re very proud to announce the release of Apache Spark 1.0 . Apache Spark 1.0 is a major milestone for the Spark...
Making Apache Spark Easier to Use in Java with Java 8
One of Apache Spark’s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala...
Apache Spark 0.9.1 Released
We are happy to announce the availability of Apache Spark 0.9.1 ! This is a maintenance release with bug fixes, performance improvements, better...