Solution Accelerator: Telco Customer Churn Predictor
Skip directly to the notebooks referenced throughout this post. When T-Mobile embraced the un-carrier label, they didn’t just kick off a marketing campaign; they fundamentally changed the dynamics in the US market for telecom. Previously, telecom had been a staid, utility-like industry with steady growth and subscribers locked into two-year contracts to cover a “free”...
Accelerating ML Experimentation in MLflow
This fall, I interned with the ML team, which is responsible for building the tools and services that make it easy to do machine learning on Databricks. During my internship, I implemented several ease-of-use features in MLflow, an open-source machine learning lifecycle management project, and made enhancements to the Reproduce Run capability on the Databricks...
Strategies for Modernizing Investment Data Platforms
The appetite for investment was at a historic high in 2020 for both individual and institutional investors. One study showed that “retail traders make up nearly 25% of the stock market following COVID-driven volatility”. Moreover, institutional investors have piled on investments in cryptocurrency, with 36% invested in cryptocurrency, as outlined in Business Insider . As...
Combining Rules-based and AI Models to Combat Financial Fraud
The financial services industry (FSI) is rushing towards transformational change, delivering transactional features and facilitating payments through new digital channels to remain competitive. Unfortunately, the speed and convenience that these capabilities afford also benefit fraudsters. Fraud in financial services still remains the number one threat to organizations’ bottom line given the record-high increase in overall...
Bayesian Modeling of the Temporal Dynamics of COVID-19 Using PyMC3
In this post, we look at how to use PyMC3 to infer the disease parameters for COVID-19. PyMC3 is a popular probabilistic programming framework that is used for Bayesian modeling. Two popular methods to accomplish this are the Markov Chain Monte Carlo (MCMC) and Variational Inference methods. The work here looks at using the currently...
Personalizing the Customer Experience with Recommendations
Go directly to the Recommendation notebooks referenced throughout this post. Retail made a giant leap forward in the adoption of e-commerce in 2020, E-commerce as a percentage of total retail saw multiple years of progress in one year. Meanwhile, COVID, lockdowns and economic uncertainty have completely disrupted how we engage and retain customers. Companies need...
MLflow Model Registry on Databricks Simplifies MLOps With CI/CD Features
MLflow helps organizations manage the ML lifecycle through the ability to track experiment metrics, parameters, and artifacts, as well as deploy models to batch or real-time serving systems. The MLflow Model Registry provides a central repository to manage the model deployment lifecycle, acting as the hub between experimentation and deployment. A critical part of MLOps,...
How to Train XGBoost With Spark
XGBoost is currently one of the most popular machine learning libraries and distributed training is becoming more frequently required to accommodate the rapidly increasing size of datasets. To utilize distributed training on a Spark cluster, the XGBoost4J-Spark package can be used in Scala pipelines but presents issues with Python pipelines. This article will go over...
MLflow 1.12 Features Extended PyTorch Integration
MLflow 1.12 features include extended PyTorch integration, SHAP model explainability, autologging MLflow entities for supported model flavors, and a number of UI and document improvements. Now available on PyPI and the docs online, you can install this new release with pip install mlflow==1.12.0 as described in the MLflow quickstart guide. In this blog, we briefly...
Quickly Deploy, Test, and Manage ML Models as REST Endpoints with MLflow Model Serving on Databricks
MLflow Model Registry now provides turnkey model serving for dashboarding and real-time inference, including code snippets for tests, controls, and automation. MLflow Model Serving on Databricks provides a turnkey solution to host machine learning (ML) models as REST endpoints that are updated automatically, enabling data teams to own the end-to-end lifecycle of a real-time machine...