New eBook Released: Lessons for Large-Scale Machine Learning Deployments on Apache Spark

Published: July 6, 2016

We are excited to announce that the third eBook in our technical blog book series, Lessons for Large-Scale Machine Learning Deployments on Apache Spark, has been released today!

You can download the eBook here.

This eBook, the third of a series, picks up where the second book left off on the topic of advanced analytics, and jumps straight into practical tips for performance tuning and powerful integrations with other machine learning tools - including the popular deep learning framework TensorFlow and the python library scikit-learn. The second section of the book is devoted to addressing the roadblocks in developing machine learning algorithms on Apache Spark - from simple visualizations to modeling audiences with Apache Spark machine learning pipelines. Finally, the eBook showcases a selection of Spark machine learning use cases from ad tech, retail, financial services, and many other industries.

As with the past eBooks, we’ve augmented the blogs with code examples in Databricks notebooks, which are complimentary with the eBook download. A sample of these notebooks include:

Distributed cross-validation when training a classifier using Apache Spark and scikit-learn

Download the eBook to get started on your next advanced analytics project today. To try out the code examples, sign-up for Databricks and import the notebooks. If you have not read the previous eBooks in the series, check them out to get a solid foundation!

Your compact guide to modern analytics

Never miss a Databricks post

Sign up