Sourav Mazumder - Databricks

Sourav Mazumder

Big Data Architect, Evangelist, and Technology Leader, IBM Analytics

Sourav Mazumder is a Data Scientist Thought Leader, Open Group Distinguished Data Scientist, responsible for technical thought leadership and strategy, technical vitality, and technical enablement in AI and Data Science area for IBM’s clients and also internal people in IBM. Sourav works with enterprise clients from AI strategy development to implementations and productionization particularly focusing on First Of a Kind Projects. Sourav has authored multiple books, blogs and articles in AI, Data Science and Big Data space. Sourav also regularly speaks in various Industry conferences, like Open Data Science Conference, Spark Summit, IBM Think, Global AI Conference.


Comparing CatBoost with Existing Gradient Boosting Approaches Using Health Care Use CasesSummit 2020

CatBoost is one of the newest innovations among gradient boosting family of algorithms aimed to overcome two critical issues commonly found in existing methods, namely 'prediction shift' and 'time to fit.' CatBoost additionally improves categorical encoding practice. By aligning the categorical features with the relative values of the target variables, the least amount of information loss is maintained in the implicit conversion of categorical features into a vector. In this session, we aim to show the results of a benchmarking activity of CatBoost against other popular gradient boosting methods (XGBoost and LGBM) using healthcare data use cases. The comparisons include the benefit of using CatBoost's implicit categorical encoding, avoiding prediction shift, improvements in metric scoring across multiple approaches, and speed to train using healthcare and highly dimensional datasets. CatBoost is used with distributed processing paradigm of Spark, exploiting domain nuances of data. The data was prepared using the principles of the OMOP common data model in the healthcare domain. The results demonstrate a comparative study across regression, binary, and multi-classification problems. The three datasets selected include the 2020 Affordable Care Act (ACA) individual premium rating, End-stage renal disease healthcare claims, and Net Promoter survey scores with supplemental patient level characteristics. The attendees will learn to use CatBoost and understand its innovations added to the data science community and any drawbacks in comparison to other gradient boosting methods. This approach can be extrapolated to use case in other domains.


Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation EngineSummit 2018

Knowing insights about the personality of the people you are less familiar with in the work place, social media, or real social circle is always an interesting idea. This can help businesses to understand psychology of their customers, employees and partners which can in turn help creating a successful partnership and loyal customers. A Recommendation Engine which can provide insight about the personality of a customer can be very effective to maintain a loyal customer base by aligning with their need and behavioral pattern while suggesting a new product/service. However, creating such an engine and keeping it up to date with changing behavioral aspect of human nature can be a daunting task. In this session, we'll discuss how Watson Personality Insight API in conjunction with Spark can be used to create and maintaining such a Recommendation Engine for Personality Insight for the customer. We shall demonstrate the steps for the same through a use case where Spark Streaming would be used to continuously get written content snippets from various streaming data sources; Spark DataFrameReader would be used to get static data from static data sources; Watson Personality Insight API would be used to obtain Personality rating around 3 popular Personality models (Big Five, Needs and Values) from the snippets of written communication by a target person and finally Spark's distributed processing engine would be used to call Watson Personality Insight API in parallel for thousands of time for thousands of the text snippet and also for collating the result. In this session attendees will learn how insights about target person's personality can be created using the snippets from their written communication using Watson Personality Insight API and Spark. They will also learn how a Recommendation Engine for Personality Insight can be created and maintained in an automated fashion. Session hashtag: #AI9SAIS