Wizard Driven AI Anomaly Detection with Databricks in Azure

May 26, 2021 05:00 PM (PT)

Download Slides

Fraud is prevalent in every industry, and growing at an increasing rate, as the volume of transactions increases with automation. The National Healthcare Anti-Fraud Association estimates $350B of fraudulent spending. Forbes estimates $25B spending by US banks on anti-money laundering compliance. At the same time as fraud and anomaly detection use cases are booming, the skills gap of expert data scientists available to perform fraud detection is widening.

The Kavi Global team will present a cloud native, wizard-driven AI anomaly detection solution, enabling Citizen Data Scientists to easily create anomaly detection models to automatically flag Collective, Contextual, and Point anomalies, at the transaction level, as well as collusion between actors. Unsupervised methods (Distribution, Clustering, Association, Sequencing, Historical Occurrence, Custom Rules) and supervised (Random Forest, Neural Network) models are executed in Apache Spark on Databricks.

An innovative aggregation framework converts probabilistic fraud scores and their probabilities into a meaningful and actionable prioritized list of suspicious (a statistical outlier) and potentially fraudulent transaction to be investigated from a business point of view. The AI Anomaly Detection models improve over time using Human-in-the-Loop feedback methods to label data for supervised modeling.

Finally, The Kavi team overviews the Anomaly Lifecycle: from statistical outlier to validated business fraud for reclaim and business process changes to long term prevention strategies using proactive audits upstream at the time of estimate to prevent revenue leakage. Two client success stories will be presented acros Pharmaceutical Rx and Transportation industries.

In this session watch:
Naomi Kaduwela, Director, Kavi Global
Rajesh Inbasekaran, CTO, Kavi Global

 

Transcript

Naomi Kaduwela: Hi, everyone. Thanks for tuning into our session, Wizard Driven AI Anomaly Detection. I’m Naomi and today my co-presenter is Rajesh, and we’re here from Kavi Global. Kavi Global is a data and analytics, services, software and solutions company and we support clients cross industry.
In my role as head of Kavi Labs, I lead the innovation and incubation arm of Kavi Global. I have over a decade of data science and product management expertise, leveraging design thinking to deliver value and create innovative blue ocean strategy. And at Kavi Labs, we often partner with enterprises, research organizations, and academic institutes to quickly prove out business benefit and feasibility of these innovative AI solutions and applications, and today you’re going to see one of those innovative AI anomaly detection solutions.
And my co-presenter Rajesh, he’s the CTO of Kavi Global, as well as one of our founding members of the Kavi Global team and Rajesh has over two decades of experience in advanced analytics. He’s designed countless smart applications with embedded AI, to enable the decision support in operational workflows, like you’ll see today, and he holds several patents in innovative SAS and architectural creations, including one on today’s solution.
So we have a truly exciting session for you today. First, I’ll take you through the overview of the fraud prevention opportunity that we see and the need for AI audits, then I’ll dive into the key persona of this solution, or as Gartner calls it, the citizen data scientist and I’ll overview, how in our solution approach, we design for these citizen data scientists. I will then discuss the anomaly lifecycle, the personas involved in it and the human-in-the-loop feedback that we use for continuous improvement of the modeling and finally, I’ll cover some deployment options of how enterprises can consume this solution and share some of the success stories.
Then I’ll turn it over to Rajesh and he’ll dive into the technical details of the solution, on how it works, and share with you a really modern architectural framework that’s cloud native and serverless and then he’ll dive into the details of the Databricks integration.
The takeaway here is that the fraud prevention opportunity is huge. You can see the numbers, $350 billion in healthcare, fraud, waste and abuse, $25 billion in banking industries, and $40 billion in insurance. Really anywhere where you have these payment transactions, there’s potential for fraud and the opportunity to prevent it.
So, why AI audits or automating this audit process? Well, it’s simply not possible for a human to be able to process the large volume of data and extract all these complex patterns that will indicate the contextual and collective anomaly patterns. They can’t do this with the naked human eye in a way that’s reliable and scalable. Moreover, when humans are doing all these repetitive tasks, we can, of course, provoke excessive false positives, and in this case accidentally classifying fraud that may not really be and in other cases, overlooking the false negatives cases, where we’ve missed actual occurrences of fraud.
So, these tasks that are highly repetitive and need to continuously adapt over time, these are tasks that are well-suited for the AI. So in this way, humans can of course maximize their potential and focus more on higher value, add unique value creation opportunities that actually require their human ingenuity.
So in this solution, you’ll see how we can use automated audits to flag two important things. First is of course, individual transactions. For example, individual invoice items that might be deviating from the expected norm. But what’s also important to look at is the collusion between actors or being able to have actor to actor flagging. This is important because we know fraud doesn’t occur in isolation and we need to catch this collusion between actors. In the healthcare example, this might be between patients and pharmacies and doctors.
Now, a little bit about the key persona of this type of solution wizard driven AI anomaly detection, that’s built on Databricks and Azure. It is what Gartner calls, the citizen data scientist. Now they might not be as technical as your traditional data scientist, but they bring strong domain expertise, and this is really important.
So when designing for them, of course, we have to simplify and abstract all the coding and technical details via this no-code wizard driven interface, which you’re going to see shortly. But in general, this is actually the future of all AI solutions, and we see the data and analytics industry moving this way because this actually accelerates time to value and maximizing the human potential because instead of spending time debugging syntax errors, we can actually focus on solving the business problem.
So this is all of course enabled by this technology abstraction that we see in this space just as it happened sometime ago with website development, becoming drag and drop. And this is what’s enabling the shift in resources and enterprises and moving to being able to have citizen data scientists that are deep in their domain expertise, and don’t have to continuously evolve that technical skillset.
So when we were designing this approach, few key things we had to consider. The first is that there are many types of anomalies. In this case an anomaly is that outlier, something different from the norm, which is statistically representing some type of potential fraud, and we’re calling these anomaly signatures. Then, the next thing we had to consider is there are also many methods, in this case, machine learning models, that we can use to detect anomaly signatures, these outliers, or this fraud.
Furthermore, we want to have a holistic business view and rather than just looking at one anomaly signature or one machine learning modeling at a time, we want to combine it all together. So why pick just one machine learning model or one anomaly signature when you can go after them all? The challenge is of course going to be when you try to deal with aggregating that into a meaningful score to enable that decision support, but this is actually where we have some patented aggregation framework and we’ll talk more about this in a bit.
Finally, the last thing we wanted to consider is that we should enable and support the whole anomaly lifecycle across all the personas, the business and the technical, because that’s when you can really measure and improve the overall effectiveness and efficiency of your solution.
So with all that solution approach in mind, now you can see the five key ways that we’ve been able to design for citizen data scientists. Let’s discuss the features.
First, of course, wizard driven, no code, no programming required, and we’re guiding them all along the way. Also, we’re giving them access to a whole portfolio of algorithms. Everything that’s available there in spark we’re leveraging and pulling in, and we can leverage the machine learning models as well as quickly evaluating them, looking at visualizations that help us differentiate which parameters set or set of hyper parameters might be best as we go through this model, experimentation phase in a no-code and wizard driven way.
So this is much faster and much more efficient than the traditional way of having to go and build one machine learning model at a time, and then try to compare which one is better and then struggle with how to combine them and which one to choose, which brings us right into the next point of business benefit.
Now, if we’re not able to translate all of the output of machine learning models in a holistic and meaningful way, in a way that is quantifiable to the business sizing the opportunity, it’s very difficult to get the buy-in and the usage and the adoption and the sustainability of these models. So it’s very important, especially for the citizen data scientists, to be able to quickly roll all of that up and present their findings as a business benefit.
And then finally, as we mentioned, supporting the whole life cycle. So from the data scientists, having this task to identify anomalies for building those models, combining them, quantifying the benefit, identifying those transactions to the business, and then confirming them as business fraud via the business user who will then be able to take that, to get the reclaim or prevent the fraud, you can see how all of this lifecycle across the anomaly is going to be embedded in this tool. So now let me dive in and I’ll show you each of these points as a visual via the solution.
So in this first one, we can see an example of what we mean by wizard driven, no code. In the top panel, you’re always being guided. Where are you in the workflow? There’s a step-by-step process. And as well, if you note to the right, as you work within the tool, populating different fields, there’s a help text in the side. It’s basically educating as you go. Even videos can be embedded in here. This is completely disrupting the traditional methods of training where they give you user manuals or expect you to read something before you use the tool. Here, the tool is intuitive enough and good design principles are of course important in this to drive some good user experience and make sure it’s intuitive, but also guiding them in the workflow and educating the citizen data scientists as they go.
Next, we have access to a portfolio of supervised and unsupervised models. Here, unsupervised modeling is really the key because we need to detect things that we’re not already aware of. We need to catch the new fraud, and by using unsupervised models, we can do so with less labor because these models don’t require the manual labeling that humans have to give input into to identify and confirm the fraud.
That said, though, unsupervised modeling is great to give us a head start, we do have the supervised models as well, because of course we want the models to improve over time and that is done by having that human-in-the-loop and providing their feedback, so we can confirm not just what’s statistically anomaly or looking like an outlier or fraud, but what is that business confirmed fraud? And that’s where the supervised models will, of course, help. Next, we can see how we can dive into some of the model evaluation visualizations.
So here, you might be running multiple parameter sets for a particular machine learning model that you’re running to solve some anomaly signature. After setting some parameters, you can see an output such as this. First, a table detailing out how many groups are included or excluded based on your criteria. For example, if you’re using a distribution based model, based on the particular extreme outlier percentiles you might’ve set.
Then you can use the tree map and you can investigate, of the segments that you’ve now created, which ones are having high variance versus low variance. Then you might be interested to investigate those high variance segments. Here you can see they’re colored dark and do a little more deep dive into what exactly is going on in that segment. And here you can look at the next chart, which gives you the distribution of the spread of values within that segment. So here you can see some of the visualizations that are automatically generated as you go through and play around with your hyper parameter settings.
Finally, the grand finale, the business benefits, the most important from the business point of view. Here, you can see an equipment leasing company example, and this is a summary of the scored 2019 data aggregated into the business benefits. And here on the KPIs, you can see the walk, the anomaly walk, which starts from the data in the starting data set for scoring, so the number of records and the dollar amount, and then the next KPIs are basically the output of the modeling. The suspected records for investigation, the suspected fraudulent records, as well as their probable savings.
Below, you can see the breakdown by each method, and here you’ll note, again, we don’t just simply sum these up to get the overall savings because there could be cases of double counting. So again, that’s where we have to use that joint probability framework and aggregate so that we can get an estimated realistic amount in that probable savings.
Now, the thing is not all statistical outliers are actually business confirmed fraud. So that’s what you’re seeing in this visual here, how we can start from raw data predicting those anomalies, which will [inaudible] at the possible fraud, but then we need that intervention, that business SME to confirm the fraud and trigger that actual recovery or preventative action. And that’s what you can see here. The entire life cycle that’s contained within the solution and all the key personas, basically the citizen data scientist, who you can see on the right, who would be handling model building, anomaly detection, and then handing that over to the business SME for the validation, the recovery process, or any preventative measures, and then sharing their feedback either post invoicing or during the time of estimate so that the model can learn and get better over time.
And coming to that, that is how we can look at the enterprise deployment or consumption options. So the first option is, of course, to implement this automated AI audit at the time of estimate. In this way, you can connect it via an API to do the real-time scoring and get that insight or estimate at the time of estimates so that you have not actually gone to the payment, money has not been exchanged, you can actually prevent that reoccurring revenue leakage.
However, if you’re doing it post the time of payment, for example, you could run a batch of invoices that have already been paid for, you can also identify the reclaim and trigger that process. But again, it’s going to take some more time and effort and resources to get that reclaim. So again, always great to be able to implement the preventative measure in real time at time of estimate.
So you can see we’re looking at a typical enterprise data platform, conceptual architecture here, from ingestion via multiple source systems into the data services layer, which will handle all the transformation, its governance, quality, modeling, and democratizing the data for self-service. And then of course, the digital solutions layer where you have all the KPIs metrics and modeling. So note in this tool, it’s assuming that the data is prepped via the data services layer, so here we’ve done that using our tool Advana, which is also no-code and wizard driven on the data engineering side.
And now we can dive into a few of the wins with the clients using this. The key takeaway here is that the ROI is high and the payback time is short. You can see both clients had millions of dollars of opportunity identified that they were able to go after, and it was very quick and easy to trigger either reclaims or a processes changes to stop the reoccurring revenue leakage as they had the data. So with that, I’ll turn it over to Rajesh and he’ll overview how the solution works.

Rajesh Inbaseka…: Thanks, Naomi. So let me start with high-level technical view of the solution. So what does Mantis do? So you bring in your historical transaction, in the used case, it was like four to five years, which Naomi mentioned. You can bring in historical fraud, it’s optional, you don’t have to. Otherwise we’ll just use the unsupervised models. So the solution kind of trains the historical transactions.
You can also bring in some dimensions, if we have it like patient data, doctor data and stuff like that, it builds with the historical transaction, it builds various anomaly models. And then when you bring new transaction, it starts scoring and then it flags any of the transactions or a group of transaction, like for sequencing and recency effect and stuff like that. It can flag fraudulent actors, and also the interaction between actors, as I said, the combination of actors, it could be similar actors or a different actor types, as Naomi mentioned, like doctors and patients, patient and pharmacies, and stuff like that.
So it’s basically a model building and model scoring application, specifically focused on anomaly detection and it’s focused towards citizen data scientists, so it’s visit driven and local.
So let’s start with how it’s architected. So when we started architecting this, we wanted to make sure it’s cloud native and serverless because cloud native, we want to piggyback on all the security and scalability and other options, which the cloud provider brings. And also we wanted to keep it serverless because this is mostly done in bad sometimes and we want to increase the capacity as an [inaudible]. So as I explained earlier, I’m going to touch upon the various aspects of the solution into what are the competence we used. So we have a front end web application, which is posted on app services, Azure app services, serverless.
We piggyback on Azure active directory for authentication and authorization purposes. We use API management for securing the Azure functions, which is possibly the services of the application. It is the one gel, which kind of connects the various other competence of the solution. We use Azure dev ops for all the metadata repository, like what are the opportunities? What is the project? What are the parameters? Everything gets checked in and checked out of this code repository. So it comes up with all the features, what the code repository brings in.
We use Azure Databricks for doing all the analytics, workload, building models, scoring, et cetera, and the cluster gets started on demand and shuts down and then the usage stops. We use Azure storage to store both the historical transactions, new transactions, and as well as any outputs from the scoring and stuff like that, use the service bus for notifying the web application in terms of asynchronous calls, whenever you run scoring other things, it takes a lot of time.
And we use the service bus to kind of maintain the submissions to Databricks, maintain the status. And the SQL database comes for more operational scenario. Especially, as Naomi mentioned, we manage the anomaly through its life cycle. So there’s a user involvement in terms of capturing whether it is a confirmed fraud or it’s a false positive. So the user writes back to the data. So the right, those information to the SQL database, which kind of gets integrated into the storage on a batch basis. So all these components are all straight from cloud, and so it’s secured and it’s scalable as well.
So now they’re going to talk about the Databricks integration. So as I said, all the analytics workload in this solution is being done by Databricks. So how do we integrate?
So we integrate using the 2.0 APA of Databricks, we use the jobs API. So we have a bunch of parameters jobs, already defined and stored in Databricks’ system, and we use the Python task as the template to run this jobs. So since I said, it’s the parameters job, we pass the parameters to the job using the Python params set up. So we pass, what are the parameters we want the jobs to know through the setup, calling the rest to run through API so that it triggers the job and we track based on the job ID.
We also have certain interactive requirements. For example, whenever we want to take out a report, we are not waiting for a long time, so we want the result to be immediately returned to the front end, so we use a notebook task for that reason and we pass using a notebook params. So we figure the notebook, it runs, does the analysis, gives back the results immediately, which gets showed in the front end application. So we kind of tightly integrate the Databricks for both the batch as well as interactive requirement.
So yeah, if you have any questions, please feel free. Thanks Naomi, again.

Naomi Kaduwela: Thank you all. Please remember to share your feedback and feel free to reach out to us on LinkedIn and we’ll see you in the Q and A session.

Naomi Kaduwela

Naomi is the Head of Kavi Labs at Kavi Global, a consulting services, software and solutions company. Naomi is the Head of Kavi Labs, the innovation and incubation arm of Kavi Global. She is an extrem...
Read more

Rajesh Inbasekaran

Rajesh is the CTO of Kavi Global, a consulting services, software and solutions company. Rajesh has two decades of experience in advanced analytics and information technology, specializing in data eng...
Read more