AI is becoming omni-present and is influencing the Payments industry in a big way. At VISA, AI driven-products are changing the way we do our business. Merchants are one of the core entities in any payments network. Millions of merchants are observed to be added to the payments ecosystem every month. Some of these are indeed new businesses but a significant fraction, are merchants that have created a new identity with changed attributes. For Visa, it is essential & highly beneficial to have an oversight of how a merchant is in our network. Being able to do so on a continuous basis leads itself to several use-cases as risk-mitigation, loyalty programs etc.
At VISA, we’re using AI, big data tools and our suite of internal products to detect merchant changes. Our AI model currently leverages the scale and depth of VISA data along with a suite of AI techniques to track a merchant with very high accuracy. Accuracy and timeliness are of utter importance because not knowing the merchant and its whereabouts can lead to incorrect merchant offers and delays in merchant queries. In this talk, we will share details about our AI model that looks at merchant patterns over regular intervals. We will discuss the specialized data engineering used and several aspects of the model architecture that includes the traditional Machine Learning and consumer behavior pattern based approaches, continuing onto unsupervised learning techniques using near-duplicate algorithms like Locality Sensitivity Hashing from Spark ML.
– Good morning, everyone. Welcome to Spark AI Summit 2020. The Layout is a little different this year, the summit is virtual but we hope everybody’s still having a great time at the summit. Before we begin, let’s have a brief introduction ground. My name is Anaurag Tangri and I’ve worked for Visa. I also have my colleague Jianhua Huang join me today who will introduce himself shortly At Visa we both work for a team called Visa Research.
Visa Research conducts fundamental and applied research into a number of use cases to build new market opportunities and business opportunities for Visa. Visa Research focuses in the area of AI and security. We have a pretty unique group at Visa in terms of the work we do. We are not always constrained by product deliverables and timelines. As we explore a broad array of technologies and use cases, some of that might never result in products while others have a very meaningful impact for Visa. Today we wanna welcome you to our talk of using AI to support proliferating merchant changes. Specifically we wanna talk about the changes to merchants in the dynamic payment ecosystem and how we are using AI to detect those changes in time and accurately. So with that, let’s dive right in.
So here’s the agenda for today’s talk.
So we’re gonna talk about introduction to Visa and we’ll have VisaNet basics.
So Visa is our payment network, and then we go to our topic for today’s talk, which is Merchant Entity Recognition in short M-E-R. So we’ll talk about how merchant changes in the payment ecosystem and how we are using AI to detect the merchant changes. Specifically we have two modeling approaches for the problem. The first one is called feature similarity based on similarity between merchant profiles. And if the merchant profile changes dramatically, then we choose the second approach, which is called consumer behavior. And it’s a complimentary approach for the problem. So I will be covering the first approach and my colleague, Jianhua will cover the second approach. And then we’ll look at some key observations followed by Q and A.
So here’s a quick disclaimer. So all statistics, research and recommendations provided today are for informational purposes. Visa does not assume any responsibility or liability that might arise from using this information.
So with that, let’s look at who is Visa. We are a global payments technology company that enables digital payments. We work with a wide variety of financial institutions, governments, consumers to enable financial inclusion. Visa is one of the most common forms of payments available worldwide. Let’s also talk about who we are not. So we are not credit card issuers. We are not a bank or a lender. So that’s the most common misconception that people have. And then we are not exposed to consumer credit risks. So with that, we wanted to show you how a transaction flows through our payment network? Before we do that, let’s take a look at four parties involved here. So on the right you see issuer and a cardholder An issuer is a bank that issues cards to people like you and me. And on the left, you see merchant and acquirer. A merchant here could be across the store or a restaurant or a clothing store that we visit on a regular basis. And merchants in turn have their own bank too, which is called acquierer. So for that depth background, let’s see the journey of a transaction. So it all begins with the card you are at the merchant location ability to check out. At that point, you will swipe or tap your card. The merchant will further send that request to the acquirer and acquierer was sending it to Visa. So at that point, transaction has entered VisaNet, which is a network built and maintained by Visa. Visa will further forward that request to the issuer and the issuer will look at the card holder’s details and provide a response in terms of approval or denial. And the request flows back in the same day from issuer to VisaNet to acquirer and finally to the merchant. And at that point you have an approval for the request and all of this happens while you are waiting at the merchants terminal. So we see that VisaNets always makes sure that the transaction is processed securely and promptly. And here we see some statistics about our VisaNet. So we have 3.5 billion Visa cards, 7.8 billion people. So it’s one is to two ratio and VisaNet processes 180 billion transactions every year. So that gives you an idea of the scale and the massiveness of the data set that we have. So this itself is a big challenge as an opportunity for a data scientist like us to work on this data day in and day out. And but that we wanna to talk about our topic for today, which is Machine Entity Recognition. In short we’ll call it any M-E-R.
So here we wanted to demonstrate the merchant-acquirer relationship over time. So time T zero we see merchant coming our system with the name ABC and an acquirer bank A and it gets an ID assigned to it, which is more ID one for the time T one, the merchant might decide to change he’s acquierer bank. And at that time it might have a new name and a new acquirer bank and gets a new ID assigned in our system. Further we see at time T three, the merchant might decide to go back to the older acquierer, which is the acquierer bank in this case. And might again have a different name. So this shows that how merchant is changing entity in our system. And then we don’t have a direct relationship with merchant. So a lot of this data we are getting is through our VisaNet data and we are trying to build this relationship.
So brings us to the problem statement for today. That merchants are constantly moving acquierer as in our system, and they are showing up as new merchants. So how do we link those new merchants to existing merchant entities in our system that have just changed their entity in the system? So if you are able to do that, we have a lot of benefits of this linking. Some of them listed here on the screen, the first one being the efficient management of merchant offers, we can over qualify or under qualify the offers based on how merchant has changed. And then we can have a coherent merchant information or time for our risk management systems. And it also helps preventing the bad players from using the system. And then we can also provide a lot merchants based on their time on file. Time on file is when we see the merchant for the first time and then you can create their peer groups.
So with that, we want to talk about the first approach, which is a merchant feature similarity approach. So here you have the process flow. So we’ll talk about first the merchant ID creation. We look at our authorization data when create our MER IDs. And at the end of it, we have a merchant ID database where we are creating these ID. So essentially we create the machine and be enriched them. And here, then you see the merchant tracking. So here we start with the list of disappeared merchants, which you see in the middle as day N and we calculate that disappeared merchant by starting a day minus N. And in this case, it is seven days prior looking at the merchant’s activity. And at the day N, we are able to see that out of those active merchants, these many merchants have disappeared. And then you see that at that point that day N, our linking process gets started. And every day we try to do this linking. And at the end of it, we able to say how many of these merchants are linked and how many are unlinked. And the reason we do a daily linking for M number of days is because a lot of times we see that then merchants comes back in our system. They might take time to bring the volume back to the new acquirer. So this window concept really helps us in that case. And it was right you see what all is happening in our merchant tracking system. So first thing we are using is something called LSH Algorithm. And we’re gonna talk about that in next slides. And we are using Apache Spark Mllib implementation for that. And then you also look at our merchant location. So once LSH Algorithm results in our appropriate at our our merchants, then we’ll look at the merchant location. And then we look at the volume shift and then you also look at their name similarity and many more.
So for that, let’s take a brief look at how LSH Algorithm works. So LSH stands for Locality Sensitive Hashing. So this algorithm is based on the principle that if two points are close to each other in high dimensional space, then they’re close to each other, in lower dimensional space also. For example here you see two red spheres in figure A, which are very close to each other. So if you look at those points from another dimension, as in figure B, they are still close to each other, and same stands true for these two green cubes which are very far apart from each other. And further when we project these points in a low dimensional space, if they are close to each other, they will be close to each other in the lower dimensional space also. And LSH also further does something called random projections, which rarely add to the probability of doing random projections from high dimensional to lower space and getting a very high probability match. So with that LSH introduction, we wanted to show how our merchant profile vector looks like. So here you see on the left a merchant name, which are three coffee shops in three different geographies. And then we’ll take our transaction count and we try to build a transaction pattern from this transaction count. So we look at the transaction and split them by hours of their day, so zero to third hour, four to sixth hour. And as you can see gave you see a very unique pattern for UK transactions and New York will come further in the day and their transaction will be consulted in different times of the day and same for San Francisco. So this gives us a very unique way of localizing the merchants in an geography. Further we add some amount based features, so here you see transaction amount. And then we have average ticket size, which is the amount that consumers are spending at that location. And then we have some person ties for amounts and further, we also add country based counts. So here you can see that UK has country code 826. So the count spent there. And for US we have country code 840. So New York city and San Francisco Counts end up there. The next set of concept we add is general based counts. So here you can see Billpay, Ecom, Ecom stands for E-Commerce and online transactions model is mail order telephone order. And then we have face to face. So in this case, all these transactions happen at the merchant location. So the counts end up in face-to-face category. And towards the further right. You see something called MCC count. MCC stands for Merchant Category Code. So here we have different merchants like restaurants, like clothing stores or groceries. They all have their own count codes. And here we are just encoding those MCC counts. So now when you look at this entire profile, you can see that we can represent merchant profiles for UK merchants in a very specific way using their transaction pattern, their amounts, their country codes, channels, and MCC, which really helps us in localizing the results from the algorithm.
So with that, now let’s look at how LSH Algorithm works in our case. So here on the left, you will see the merchants profile vectors that we just looked at. And these are the merchant vectors that other new merchants that appeared in our system. So you can see that merchant two M3 and M4 have very similar merchant profiles. So when they run to through the algorithm model will assign them hash values which are very close to each other and they will end up in one bucket, similarly, merchant M1 and M5. They have a very similar merchant profile and they will end up with a similar hash values in a different bucket. And now when we find our missing merchant, which is at the top called merchant Q, you can see that their merchant profile looks very similar to merchant one and merchant five. And in this case, LSH algorithm will assign them in the same bucket. And then on top of that, we use our further matching techniques like merchant location, merchant name, which helps us in getting a very high probability match of the missing merchant to the new merchant. So this is how LSH Algorithm is able to work very well in our scenario.
So now let’s look at some performance analysis. So we look at time complexity for a merchant for a collection of end merchants. The time complexity for linear comparison would be order of n square. For example, if you have 1 million merchants, you might end up being 10 to the power twelve comparisons, which is a lot and time consuming. And we have scalability issues. versus when we use our parties spark implementation of LSH, it can be done in sub-linear time. And similarly, when we look at space complexity, if we try to store our entire merchant vector, as you remember it was thousand plus fields. If you try to bring it in memory, it’s gonna be a very big memory overhead during runtime. Verses when we use LSH model, we’re only storing the hash values, which is the low dimensional projection of those merchant vectors in the memory, and which really helps us in the space complexity issue. So we can see that with using the Spark MLlib implementation of LSH Algorithm. We can run this process at big scale and in sub linear time.
So now we wanna talk about some model results.
So here we have a major clothing store in California. So we start monitoring the merchant on 11th of January, 2020 on 17th of January we found merchant to be disappeared. So at that point, our linking model starts. And on next day, we able to find that the missing merchant has a new acquierer, and our model is able to successfully link it.
So have here another example, which is the popular fast food chain in Florida. So again, we started monitoring the merchant on 8th of January, 2019 on 14th we found that merchant is disappeared, our linking kicks in first day we did not find it, but the next day we were able to find the match and we are able to link the missing merchant. Another example here is an E-Commerce merchant based out of Georgia. So here on 12th of January, 2019, we started monitoring the merchant and 18 two found the merchant to be disappeared. So then our linking process started kicked on 19th of January. And you can see it continued for next number of days. In our case, we are using a window time of seven days. And the end of the seventh day on 25th, we are able to link the merchant. And this ties back to our original description where we saw in our process law Then then a merchant has a new acquierer. It might take time to bring it volume back to the new acquierer and this is where this window concept really helps us in linking those kinds of merchants. And then we also wanted to show a case where our model is not able to find the merchant.
So this is a gas station in New York. So we started monitoring on 9th of January and on 15th, we found that merchant to be disappeared. And again, we do the linking process day by day. And at the end of seventh day, they still could not find merchant. And when we looked at it this merchant was permanently closed. So, which was a right scenario for our model to not find it because the merchant is closed. So now we want to just take a quick look at the statistics from this model.
So here we had the model in put up 120,000 U S merchants, which we’ll track for a week And here you see the disappear date in the first column. And second column is the number of the missing merchants. So everyday we found those many merchants to be missing. And then in orange, you see how linking was kicked in every day and every day how many merchants we are able to link. And at the end of it, we can say how many of the missing merchants were linked and how many were not linked, which were not found by our model. And then you see another category called came back. And this is again the same category where we see that merchants keeps on moving their volumes from one acquirer to another. And sometimes for some reason they will move them. And then we see that coming back after some time. So combinelly you see total linked and came back percentages. So we can see at the end, we are able to achieve about 80% linkage from this model, which was very good results from this particular model. And to that I want to invite my colleague Jianhua to cover the next complimentary approach for this model. So over to you, Jianhua. – Yes thanks Anurag. So this is Jianhua and I’m also a data scientist in Visa Research. Our tittle to this presentation to talk about, the second approach, which is based on consumer behavior. So just to reference to you about the first approach, I’ll talk about the features similarity approach briefly.
So with this basic feature similarity approach, We can check whether two merchants are the same like comparing the futures. If this features are very similar to each other, we can say this two merchants are in fact one merchant. This approach works well if the feature only changes slightly however, if the feature changes dramatically for example the high Cambridge chain for the merchant as a result the name that the merchant has and acquierer bank all of those things can change. So in this case, the features similarity approach may not work well,so in order to overcome this problem We design a new approach which is based on the consumer behavior instead of tracking the merchant Features, we track the customers that are strongly connected to the merchant.
Then based on the behavior of this strongly connected consumer we can decide whether this merchant is closed or the ID is changed. This approach is based on strictly important assumption. The first assumption is that the merchant ID change has no impact to the customer shopping behavior because the customers, they don’t even know what their ID saved in the Visa system? The second assumption is that the consistent customers, will keep going to the same merchant. And the last assumption is diverse customers will switch to different merchants when a merchant is closed. In order for this approach to work, we need to divide consistent and diverse customers precisely. So let’s have a look at the definition for consistent customer. So here consistent is used to define the behavior for individuals. Here define consistent as individuals who visited a merchant repeatedly and frequently. So in this slide, we see six different scenario, showing different consistency. in the first scenario, the customer visit the store every day, which is the most consistent. And as a contrast in the last scenario, the customers visited the merchant randomly, which means that we cannot expect whether those customers we’ll come back again. So if we can identify some consistent customer, we can check their behavior and see whether the merchant is closed or not. Next let’s look at the definition for diverse customer. So diverse here is to use to define, the behavior for a customer.
So in this table, we we’ve saw three different customers and they all went to a different store except the customers of T. So in this case, if the merchant T is closed their based customers, we will properly press T which is different merchant, for example, merchant X, Y, and Z. And in another scenario, if the merchant ID is changed in this case, because these customers are consistent customers, they will still keep going back to the same merchant regardless that ID has been changed from T to N. So with diverse customer we can differentiate two different scenarios when the merchant T is closed or when merchant ID is changed.
So now we are clear about the definition for consistent and diverse customers. Next use an example to show you how we can identify such customers. Lets assume that the merchant T disappeared in the Visa system on day N, then we can extract the historical data between day N and merchant ID day N. Then we can divide and see our list of those, to expect the consistent customers. For example, the customer one who visit the merchant T everyday, customer two visit every other day, and customer three visit every Monday. With this we can further extract our diverse customers.
So in order to identify that diverse customers, we use the jaccard index, which is temporary based in the section of stores both C1 and C2 visited divide by the number of stores C1 and C2 visited fOr example with C1 visited the red store and grey store and C2 visited green and blue stores. So the intersection would be the green store and the union will be red, green, and blue store. As a result the jaccard index consumer C4 would be one to three. So using the jaccard index, we can create a correlations between all pairs of the customers. And so it has the table on the left hand side. So as you can see in this table value is always one because the source the jaccard index was similarity between the same customer. So if we look at this variety we can see the one visited them all and the customers we can see, jaccard index between C1 and C6 went high the whether is very high which means that this two customers they have very similar shopping habit in the past, So they may be from the same family. In this case, we want to either remove C1 or C6, similarly, jaccard index we can C2 and C5 also very high. So in this case, we want to remove either C2 or C5 so at the end, we will only keep the diverse customers, who used to visit different merchants.
So we see this consistent and diverse customers. We can now check their behavior after number 20 disappear in Visa system. Here we saw two a different scenario. The first scenario is that the merchant T is closed and second scenario is that merchant ID has been changed from T to N. As we discussed in earlier slides, if the marchant is closed because the customer diverse it will replace the merchant T with different merchants. And as we saw, if we look at the daily visit based on, C1 sequence C3, you can see the preferred different stores. And if the merchant ID is changed because the customers are consistent, they will keep visiting the same merchant without even knowing that the merchant ID has been changed from T to N. So as a result in research system, we will see lots of transactions with is consistent. And diverse customers appearing in the new merchant ID.
So based on this two scenarios, we can further decide a method to identify whether the merchant is closed or ID is changed.
Here saw different scenario again. And the correct data has presxented asx historical daily visit to the target merchant T based on 100 selective consistent and diverse customers. And as we can see that they get visited when it goes to 100 days and most of the days, and the range is about 89 to 100. And what comes here we present the daily customer visit to different merchants after merchants T disappeared. So in the first scenario, if the merchant T is closed, we can see the customer will move to different merchants. For example, A F X C none of the daily visit to this merchants fall within the historical range. And most of this merchants probably are old merchants. And in contrast that if the merchant ID has been changed from T to N the customers will keep visiting that same merchant and then a new merchant ID N with similar daily visit, which is within the historical range would be detected. So using this method, we can see two different scenario heads. They have 20 different daily visit vendor and based on that we can detect whether the merchant is closed or ID change and this method can in fact be further extended to numerous recommendations. So just with just to march that we can keep checking the loyal customer for the merchant T and in some day, when a new merchant N appear, some of the new loyal customers will probably switch to the new merchant N. And we say passive more and more loyal customers will switch to N so at the end, most of the loyal they won’t visit T again and they keep visiting N frequently. So in this case, we can say a new merchant N is probably better than the old merchant T. So now we can recommend a new merchant N to other customers.
Thus this second approach is still at it’s early stage and due to the to the condiment we don’t want to cover too much about the results. And with that we can conclude this presentation. So first, if we would like to highlight that these two approach are highly complimentary to each other, if that feature changed dramatically, we probably would like to use the consumer behavior approach if the consumer volume and the data availabality is tight affinity then in this case, we would probably want to use the features similarity approach. Further we are trying to say that both model approach can be extended to other fields, for example the features similarity can be extend to spelling correction and the consumer behavior approach can be extended to merchant recommendations. And we also tried to address that. In fact, the consumer behavior based approach can be used for any kind of entropy checking. As far as that, we can be aware of the network between public entity and connected entity. Somehow we can track whether new credit card is linked to an old credit card because the old credit card has to expire, or whether a phone number is linked to an old phone number because the phone number holder just moved to a new place and whether a new account can be link to an old account and so on. And with that, I think this is the all presentation for our merchant entity tracking.
Anurag Tangri is a Lead Data Scientist at Visa. He was a Big Data Engineer for almost a decade but developed a great interest in AI domain along the way and ended up joining Visa Research to continue his passion for AI. He loves applying AI techniques to his day-to-day job to create innovative products using VISA data. In his current role, he collaborates closely with Business and Product teams at VISA to create data and AI-powered products. Prior to VISA, he has worked at companies like Yahoo and Groupon and has 7+ years of experience in Payments domain.
Jianhua Joined VISA as Staff Data Scientist in 2018. He has rich experience developing AI solutions for fraud detection using Spark and other tools from end to end. Before Joining VISA, he was a research scientist in University of Phoenix (Apollo Education Group). Jianhua got his PhD degree from Arizona State University, and MS degree (focused on Machine Learning) from Georgia Tech.