Distributing the Singular Value Decomposition with Apache SparkJuly 21, 2014 by Li Pu and Reza Zadeh in Engineering Blog Guest post by Li Pu from Twitter and Reza Zadeh from Databricks on their recent contribution to Apache Spark's machine learning library. The...
The State of Apache Spark in 2014July 18, 2014 by Matei Zaharia in Engineering Blog This post originally appeared in insideBIGDATA and is reposted here with permission. With the second Spark Summit behind us, we wanted to take...
New Features in MLlib in Apache Spark 1.0July 16, 2014 by Xiangrui Meng in Engineering Blog MLlib is an Apache Spark component focusing on machine learning. It became a standard component of Spark in version 0.8 (Sep 2013). The...
Shark, Spark SQL, Hive on Spark, and the future of SQL on Apache SparkJuly 1, 2014 by Reynold Xin in Engineering Blog With the introduction of Spark SQL and the new Hive on Apache Spark effort ( HIVE-7292 ), we get asked a lot about...
Exciting Performance Improvements on the Horizon for Spark SQLJune 2, 2014 by Michael Lumb and Zongheng Yang in Engineering Blog Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Announcing Apache Spark 1.0May 30, 2014 by Patrick Wendell in Engineering Blog Today, we’re very proud to announce the release of Apache Spark 1.0 . Apache Spark 1.0 is a major milestone for the Spark...
Making Apache Spark Easier to Use in Java with Java 8April 14, 2014 by Prashant Sharma and Matei Zaharia in Engineering Blog One of Apache Spark’s main goals is to make big data applications easier to write. Spark has always had concise APIs in Scala...
Apache Spark 0.9.1 ReleasedApril 9, 2014 by Tathagata Das in Engineering Blog We are happy to announce the availability of Apache Spark 0.9.1 ! This is a maintenance release with bug fixes, performance improvements, better...
Spark SQL: Manipulating Structured Data Using Apache SparkMarch 26, 2014 by Michael Armbrust and Reynold Xin in Engineering Blog Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Apache Spark: A Delight for DevelopersMarch 20, 2014 by Jai Ranganathan and Matei Zaharia in Engineering Blog This article was cross-posted in the Cloudera developer blog . Apache Spark is well known today for its performance benefits over MapReduce...