Skip to main content
Page 1

Statistics Functionality in Apache Spark 1.1

August 27, 2014 by Doris Xin, Burak Yavuz and Hossein Falaki in
One of our philosophies in Apache Spark is to provide rich and friendly built-in libraries so that users can easily assemble data pipelines. With Spark, and MLlib in particular, quickly gaining traction among data scientists and machine learning practitioners, we’re observing a growing demand for data analysis support outside of model fitting. To address this need, we have started to add scalable implementations of common statistical functions to facilitate v