Ben Holmes - Databricks

Ben Holmes

, USAble Mutual Insurance Company

Ben has been a leader in advanced analytics in the health insurance industry. Developing the data science team, expanding capabilities and technologies, and facilitating training are among his primary duties. He has expert knowledge in healthcare, leading initiatives to improve governmental programs, including risk and quality. customer service efficiency improvements, and healthcare financial optimizations. This includes among other activities HEDIS data submission, provider compensation analysis, provider clinical gap scorecards and portal data extracts, and providing analysis for any data warehouse implementations within ACA and MA domains.


Comparing CatBoost with Existing Gradient Boosting Approaches Using Health Care Use CasesSummit 2020

CatBoost is one of the newest innovations among gradient boosting family of algorithms aimed to overcome two critical issues commonly found in existing methods, namely 'prediction shift' and 'time to fit.' CatBoost additionally improves categorical encoding practice. By aligning the categorical features with the relative values of the target variables, the least amount of information loss is maintained in the implicit conversion of categorical features into a vector. In this session, we aim to show the results of a benchmarking activity of CatBoost against other popular gradient boosting methods (XGBoost and LGBM) using healthcare data use cases. The comparisons include the benefit of using CatBoost's implicit categorical encoding, avoiding prediction shift, improvements in metric scoring across multiple approaches, and speed to train using healthcare and highly dimensional datasets. CatBoost is used with distributed processing paradigm of Spark, exploiting domain nuances of data. The data was prepared using the principles of the OMOP common data model in the healthcare domain. The results demonstrate a comparative study across regression, binary, and multi-classification problems. The three datasets selected include the 2020 Affordable Care Act (ACA) individual premium rating, End-stage renal disease healthcare claims, and Net Promoter survey scores with supplemental patient level characteristics. The attendees will learn to use CatBoost and understand its innovations added to the data science community and any drawbacks in comparison to other gradient boosting methods. This approach can be extrapolated to use case in other domains.