We are excited to announce that the third eBook in our technical blog book series, Lessons for Large-Scale Machine Learning Deployments on Apache Spark, has been released today!
This eBook, the third of a series, picks up where the second book left off on the topic of advanced analytics, and jumps straight into practical tips for performance tuning and powerful integrations with other machine learning tools – including the popular deep learning framework TensorFlow and the python library scikit-learn. The second section of the book is devoted to addressing the roadblocks in developing machine learning algorithms on Apache Spark – from simple visualizations to modeling audiences with Apache Spark machine learning pipelines. Finally, the eBook showcases a selection of Spark machine learning use cases from ad tech, retail, financial services, and many other industries.
As with the past eBooks, we’ve augmented the blogs with code examples in Databricks notebooks, which are complimentary with the eBook download. A sample of these notebooks include:
- Distributed cross-validation when training a classifier using Apache Spark and scikit-learn
- On-Time Flight Performance with GraphFrames for Apache Spark
- Approximate Algorithms in Apache Spark: HyperLogLog and Quantiles
Download the eBook to get started on your next advanced analytics project today. To try out the code examples, sign-up for Databricks and import the notebooks. If you have not read the previous eBooks in the series, check them out to get a solid foundation!