One of our philosophies in Apache Spark is to provide rich and friendly built-in libraries so that users can easily assemble data pipelines. With Spark, and MLlib in particular, quickly gaining traction among data scientists and machine learning practitioners, we’re observing a growing demand for data analysis support outside of model fitting. To address this need, we have started to add scalable implementations of common statistical functions to facilitate various components of a data pipeline.
Using 3rd Party Libraries in Databricks: Apache Spark Packages and Maven Libraries
In an earlier post, we described how you can easily integrate your favorite IDE with Databricks to speed up your application development. In…