Abi is a Machine Learning and Data Science Practitioner with experience building and deploying large-scale Machine Learning Applications in different industries that include Healthcare, Finance, Telecommunications, and Insurance. Abi’s Data Science work experience includes Descriptive Analytics, Predictive Analytics, Prescriptive Analytics, and Natural Language Processing. He has extensive experience solving several business problems using Data Analytics, Sentiment Analysis, Topic Modelling, Named Entity Recognition(N.E.R), Opinion Mining, Data Mining, Time Series, Spatial Statistics and Marketing Analytics.
May 27, 2021 05:00 PM PT
When it comes to Large Scale data processing and Machine Learning, Apache Spark is no doubt one of the top battle-tested frameworks out there for handling batched or streaming workloads. The ease of use, built-in Machine Learning modules, and multi-language support makes it a very attractive choice for data wonks. However bootstrapping and getting off the ground could be difficult for most teams without leveraging a Spark cluster that is already pre-provisioned and provided as a managed service in the Cloud, while this is a very attractive choice to get going, in the long run, it could be a very expensive option if it's not well managed.
As an alternative to this approach, our team has been exploring and working a lot with running Spark and all our Machine Learning workloads and pipelines as containerized Docker packages on Kubernetes. This provides an infrastructure-agnostic abstraction layer for us, and as a result, it improves our operational efficiency and reduces our overall compute cost. Most importantly, we can easily target our Spark workload deployment to run on any major Cloud or On-prem infrastructure (with Kubernetes as the common denominator) by just modifying a few configurations.
In this talk, we will walk you through the process our team follows to make it easy for us to run a production deployment of our Machine Learning workloads and pipelines on Kubernetes which seamlessly allows us to port our implementation from a local Kubernetes set up on the laptop during development to either an On-prem or Cloud Kubernetes environment