Recommender-Based Transformers

May 28, 2021 11:05 AM (PT)

Recommender systems for e-business have been expanding during the past years. At the same time, the COVID pandemic showed the importance of efficient Supply Chains. Global actors must master the whole chain: selling a product online, producing a product, storing a product in an optimized warehouse, and delivering the product on time. The consequences, as we have seen, can be disastrous for global actors who do not deliver on time. Managing huge amounts of data, constraints, and micro-decisions in a large supply chain has now become impossible without artificial intelligence. Artificial intelligence itself had to progress to predict sequences of actions and events. Artificial intelligence was not meeting the challenge up to the arrival of the Transformer model first designed by Google in 2017.

The old, obsolete, 1980 architecture of Recurrent Neural Networks(RNNs) including the LSTMs were simply not producing good results anymore. In less than two years, transformer models wiped RNNs off the map and even outperformed human baselines for many tasks.

This presentation goes to the core of recommender-based Transformers applied to the supply chain. In a world of complexity, only AI-driven recommenders will be able to learn the behavior and constraints of a market. The presentation will begin with the supply chain paradigm major corporations are facing. Then, a recommender-based transformer will show how AI can predict hundreds of micro-decisions both managers and users make. Finally, the presentation shows how the world has evolved into a new micro-decision real-time era with AI-driven recommenders.

In this session watch:
Denis Rothman, Artificial Intelligence Specialist, Denis Rothman

 

Transcript

Speaker 1: So hi everybody. I’d first like to thank AI Summit for inviting me and giving me the opportunity to share my ideas. And today we’re going to focus on a Recommender-Based Transformers. So by the end of the presentation, you’ll know what I’m talking about and you’ll be well-versed in the subject. So the first thing you want to look at is machines taking humans beyond their limits. Okay. What does that mean? That means let’s take an in terms of transportation. Let’s go back to when we didn’t even have cars, we had to walk from one place to another to give someone a message. Let’s say we were living somewhere in Africa, and we had to go from one little place to another, to tell someone something. Then after we had drums, you know that we could do it, but then drums wouldn’t go that far.
So after we had boats and all that stuff, and then let’s fast forward. And at one point we have car, we had horses and you could, you know, the Romans used horses for messages, to carry information. And then we have cars, because I’m really fast forwarding… cars, satellites. So you can see, transportation is a key factor in information. The web is only a transportation system. The video you’re looking at came through transporting packs of information. So we’re talking about transportation and optimal transport. You don’t want your messages to take hours to get to you, you want optimal transport. So that is the key concept you want to keep in mind in this presentation is optimal transport because it’s, it’s both the horizontal concept. I just told you, but it’s vertical in artificial intelligence. It gives you a whole new vision.
So what, when we’re going here now, I’ll just say here that I went to Sorbonne university, I registered a word to vector patent. I’ve registered a cognitive, a Chatbot patent. I worked for corporations all my life. You can see all that in my bio. So I’ve, I’ve been in AI for decades and for incorporate AI systems. So let’s move on. Let’s go back to what I was saying. And of course I optimized for these corporations. So we’re going to be using three concepts here. Artificial intelligence, but we also have to explain it, and transformers, the subject of our little presentation here. What’s going on with transformers? Why are we speaking about transformers? What are they? So now everyone imagines that e-commerce has some kind of miracle thing. Where you just click on a button online and the next day someone comes in with a delivery truck and gives you something.
Well, that’s not e-commerce. E-commerce is the whole flow that goes from the customer order, from you, all the way down, all the way through all the little problems you have. And when you do that, when you’re in Amazon, for example, you want to try to group customers. So let’s say, we’re talking about clothing. You can see right here that you have to group the orders. Then you have to cut your fabric into little pieces, and then you have to sew them together. You can’t just give little pieces of fabric to people, you have to sew them together to make a et-shirt. Now you might be saying, what do I care about et-shirts in the first place? And I know what a et-shirt is and I know that when I buy a et-shirt, that someone’s going to manufacture and give it to me.
So why are we speaking about all this? Or what’s the purpose of this conversation? Well, the purpose of this conversation is to say, take COVID right? COVID is just a small example. Of course, it’s a big problem, but it’s a small example. You can see that creating, finding the vaccine is very different from producing the vaccine and then getting the vaccine somewhere, then storing the vaccine, then distributing to people. And you can see that this whole thing is called the supply chain, and artificial intelligence has to be there all over the place. Otherwise, it’s pretty bumpy. If you look at what you’re looking at today, you’re saying, how did the governments make so many mistakes with the masks, to wash our hands, the products. They didn’t make any mistakes, that’s supply chain every day. That’s what’s happening behind the scenes. When you buy something.
You buy something, and everyone’s running around behind the scenes to get it to you. And it takes six months to a year, to synchronize all that. Amazon, it took several years to synchronize all that. So this is what I’m talking about and that’s where AI comes in. And that’s why, when I’m saying to group customer orders, well, it seems so simple. But how are you going to group them when they come from all over the world?And where are you going to manufacture them? Are you going to manufacture them in real time? Amazon registered a patent to manufacture them in real-time themselves, so they wouldn’t have to wait for production. So one of the keys to all of this is transformers. Now what are the transformers do? They’re going to take all the information we’re talking about and they’re going to predict sequences.
So we’re going to look at that in three steps. So let’s keep it there. Just keep it there. Remember transformers, sequences. Now, when you look at what’s going on here on the market, you’re saying, well, why is he speaking about transformers? What’s going on? Here is the superglue leaderboard. This is the reference for all of these natural language processing algorithms. And you can see here, this is the human baseline. What is the human baseline? Well, it’s a lot of tests. It’s a lot of tests done by a lot of average humans. I would say, even little above average humans and look what’s going on. We already have two teams beating them, beating human baselines. That means that we have exceeded the capacity of humans to do something in natural language processing, which has extended to sequences. Now that’s both scary, and it’s reassuring.
Because think of that a minute, like Facebook has maybe two billion members, has to control eight billion posts a day, and you have the Senate Congress. Everyone’s asking them questions. Why do you do this? Why do you do that? And you have Zuckerberg saying there, how do you think I can control eight billion posts, messages a day? I can’t. Twitter can’t do it. No one can do it. The machines have exceeded our capacity for humans to master them. So we’re going to need these algorithms if we want to go further. Now imagine a supply chain. Imagine with all these vaccines we’re talking about, all the orders Amazon gets in one day, millions. So you need a lot of intelligence software, so let’s be happy that it’s getting there. But there still is a problem. Now here’s where we have to think a bit about what’s going on here.
Now, what are you looking at? Okay. What you’re looking at is both interesting and it can’t scare a lot of people. This is you. This is what’s going on when you’re on YouTube, or Netflix, or Spotify, or Alibaba, or Amazon, what we’re doing is we’re observing all of your behaviors that take this into account. We’re not interested in your private data anymore. Everyone’s scared. He’s going to take my name, he’s going to take my address. We don’t even care. We don’t even know your gender. We don’t care about your gender. We can find all that with your clicks. So what we do, if you take like Amazon, let’s take Amazon since we’re in the supply chain thing, and let’s take YouTube and Amazon. So you’re sitting there and you’re clicking. You’re clicking like crazy, especially in lockdown. So you’re there. And video after video, after video, after video, what’s happening? Well we’re recording all your behavior.
We’re looking what you’re clicking on, in which order you click them on, how long you spent on a sequence, and where you went from there. And don’t think we can’t follow you because when you see all these little cookies on your site, a lot of that’s owned by Facebook and Amazon and other companies. So we know where you’re going, from where you’re going to. So we’re getting this image and not you in particular, don’t worry. It’s not you personally that lives at that address. We’re just looking at patterns of behaviors. This kind of person does this, this kind of person. There’s nothing personal. So there’s nothing to worry about. No one’s worried about you personally. And honestly, no one cares who we are, anyway. If we’re dead or alive, what I’m saying here is we’re looking at trends.
So step one, we’re giving all this data for free, okay? That’s what you have to realize. Do you really want that? You’re giving all that data for free to huge corporations, that then are learning, using all that data to make extremely powerful AI tools. And then at that point, they’re going to sell it back to us. They’re going to get better. You’re going to get better results on YouTube, on Google search, on Amazon. So this is why I wrote this in red here. You can be the tool of AI or use AI as a tool to your benefit. So it’s going to depend. You want to go to cloud platforms and pay as you go, or you want to think about this presentation and say, hey, why don’t I earn some money like they do? Why do I have to just be giving them free data?
So, so what, what I’m saying here is basically we’re raw data, and we’re making these supply chains extremely intelligent. They know where they know where the profit is in this country, so that might be storing up things here. They know where to deliver. So, think about it. Now, once we have that, well, then we’re going to move on and we’re going to take some time here to understand what’s going on in terms of transport. You remember the beginning of the presentation? I said, you’re not going to understand anything if you don’t understand optimal transport distances. Distance is the key to everything, and it’s been the key for 20,000 years. And I’ll be the key in the next 10,000 years, unless we’re not humans anymore because we always reason in terms of distances. So we’re going to represent the distance with a little D and I’m going to say, well, the distance between X and Y is this little line here.
And then we’re going to say, well, I’m, I’m going to couple X and Y with some parameters. So before we make this complex, because we’re going in now, we’re going, we’re diving. We’re going into deep water now. So how does this go? Let’s say you want to go on vacation, so distance is from your place to another place. Okay. So you say, well, I have the distance. What’s this slide for? No, I have A and B what is A and B? Well, these are all the parameters. Like A of X is where I’m going from. Do I have the money to go where I want, to that other point Y? Can I take a plane? Do I have the money to pay for that ticket? But when I get to Y, there are other parameters. How expensive is the hotel or the house I want to rent?
Whoa, let me see. This distance is maybe further than I thought, because now you get to a conceptual distance for so, I thought that we were close to our goals. Think of that sentence. We were close to our goal in our department. But then now our goal has gone further, so it’s conceptual. So I’m saying now I’m farther from my vacation. Because I just found out I can pay for the plane ticket to go to Hawaii, but I don’t have enough money to go to the restaurant, even if the hotel is cheap. So these are just examples. Cause going to Y in fact is cheap. So let’s say you wanted to go to Paris from Las Vegas. Which is, which is a location we’re talking about right now. So, so you want to go from Las Vegas to Paris.
You’ll find a good place to plane ticket, but believe me, when you get to Paris, those hotels are going to be expensive. So the distance has immediately expanded. So let’s take some other examples. Okay? You can see that a cat is closer to a dog than it is to a building. Right? You’re going to say, if you’re doing a little IQ test, you’re going to say what’s closer? A cat to a dog or a cat to a building? So, then you can say woman, man, child, is closer to a house-building office, or you can say blue, yellow, and green in this part is closer to a truck. And sometimes all this might change. But what I’m saying is everything in AI just adds up to finding these little A’s and B’s to figure out if it’s close or not close.
Suppose I want to change concepts, I want to find a blue truck. So then blue might get very, very close to truck. So what we’re trying to figure out is how to get the right things together. Now let’s say green is something you like. So, and, and red is something you don’t want to see. So on YouTube, we want to figure if this video is close to you or it’s not close to you. And supply chain, we want to figure out if you want to buy this et-shirt, or you want to buy that et-shirt, or you want to buy a Chicago bulls et-shirt, or you want to buy a Paris Saint German soccer team et-shirt. So we’re trying to figure out what you want and how close you are to these different things. That’s what it’s all about. And transformers, they’re just looking at your sequence.
And they say, oh, he went from a football site to a baseball site. That’s strange. And then he went to a soccer site. This guy likes sports. So the next advertisement we’re going to put in his video is going to be about sports. Then he’s going to click on it, he’s going to go to Amazon. He’s going to buy a et-shirt and then we’re going to have to manufacture it, put it in a truck or a boat and get it to him. Okay. Sequences, distances. so let’s on. So now you have optimal transport. I’m not going to kill you with theory. So don’t worry. I mean, this is okay. I’m just giving you little ideas to look at. So optimal transport look where it dates back to 18th century. In 18th century you had Louie the 16th, in France.
And he’s saying to Gaspard Monge, he’s saying, listen, why does it take so long from, to go from where we manufacture to get it to the people? Why does it take so long? You think things have changed all these, these hundreds of years after read the New York times, read the newspapers in Germany, in India, where are the vaccines? Why is it taking so long for these vaccines to get from there to there? What is going on? It’s optimal transport. The problem hasn’t changed. And every time there are new problems, it’s complex. Now you have Kantorovich, who said, well that’s nice, but we’re going to have to make a model of this, a mathematical model so that we can find mathematical solutions. Okay. So then you get back to what we see here. AX, BY, I’m going from X to Y and I have all these parameters.
Okay. I’m going to skip some of the theory. I don’t want to scare you today, but you have Cedric Villani, who got the field’s metal in mathematics on this problem because he knew that this was the heart of artificial intelligence. Just to tell you how important to keep in mind that everything is related to distance. Okay? And you’re going to see that now. So even, so let, let, let me tell you, so right here, we have this distance, right? But, let’s keep it simple. When we go from one distance to another, we’re spending energy, right? There’s a lot of energy going on. We often say Boltzmann because we use a Boltzmann networks. Boltzmann is just calculating the energy in the system. But look what’s going on.
As things go along in your sub supply chain, the energy is building up. You’re spending more and more guests to spending more and more time, more and more resources. In fact, all that energy increases. So you have to keep in mind is a big mistake people make in artificial intelligence. Entropy never goes down, entropy’s always going up. Because the more you go through the process, the more energy you spend. And when we’re calculating in artificial intelligence, we’re just seeing if the levels of energy and entropy driven systems is the same, but entropy keeps increasing. Look what happens when you keep using big computers. You’re going to be consuming flops, okay? And a lot of CPU and GPU. And in fact, you’re going to be heating your machine up. It’s going to cost a lot of money in time. So AI is not an unconstrained abstract closed the world.
Entropy keeps increasing and it’s costing you a lot of money in terms of people, human resources, machines, data, it’s incredibly expensive. So you want to be careful. So, now let’s get to our little transformers here, and you see, I call it the donkey syndrome. Now, if you take the concept I gave to you, we’re going to be able to fast flow. We’re going to fast forward through the rest of this presentation. I call it the dunky thing. If you take recurrent neural networks, what do they do? They keep piling up information there. They’re very costly. They look at at a sentence like the cat, and then they keep adding information, information, information up to remember all the relations between shifts between the previous words. So you just go crazy trying it, just piles up and piles up and explodes. And that’s why they’re out of fashion. They’re obsolete.
I don’t know if you you’ve been aware of this, but are an enzyme obsolete and LSTM or even worse. It’s too expensive. If we go back to the energy spent. So how do we solve this problem? Well, we solve it. Now, you can see that the longer the system was going was building up. Now look at this nice transformer. You don’t see anything building up. You don’t see anything building because it’s a nice V8 engine. In this example, it will take the full sentence. It will cut it into eight parts, and run them all at the same time. So now you can use eight GPU’s at the same time, and then you can add up the information at the end. So I’m just summing it up. Of course you can read it in my book or articles, you’ll see how it works. But basically, that’s it.
We’re taking the whole sentence and we’re analyzing it eight times in eight different ways, like a V8 engine. And we’re putting it together. So it makes it unlimited in terms of links of sequences, not really unlimited, but very big. And it makes it very industrial. That makes it a modern fast car. So just think of a V8 engine, instead of having this old dunky piling up all the time. Okay? So now you have your transformer, analyzing sequences.
So now we’re going to dive down into this little model of a transformer that, that looks difficult, but in fact, extremely simple to understand now that you have the tools we have these sequences. Remember all these sequences Of everything, you’re doing? Or everything in a supply chain? Producing this, transporting, distributing, or your behaviors.
These are your sequences, but here on the left, there’s this nice little thing where you can add anything you want. People’s tastes how long they spend on videos. How long did they click on the web before purchasing something? You just put all together, and you have these fantastic V8 sequences that you can analyze. So now we’re going to dive into some little code that you’re going to understand in my sense, pretty easily, if you follow the distance thing and the sequence thing up to now, okay? So, we’re going to go into a three-step thing. We’re going to look at sequences, we’re going to look how to train them, and then we’ll look at the final results. Okay? So let’s go down here into step one.
Okay. I’m just going to dive into some code here. It’s going to be very easy to understand here as this little array where I’m telling it what it can do, and can’t do. I’m saying you can’t go there, you can go there, you can go there, you can go there. It’s just giving rules, basically like in a labyrinth. Let’s say you’re in Manhattan. Well, you can’t go through the walls. So it’s saying you can go there, there, but you can’t go there. You can go straight, straight down to an avenue, but you can’t turn to the left through a wall with your car, you’re going to have to go around it. So basically it’s telling you where to go, but it can tell you conceptually.
So if we look at a result, you’re going to say, you’re going to see what it can do. Imagine you can go from B to D to C. You can go from A to E to D to C, but it’s only doing, it’s generating all the logical possibilities. Just imagine your in Manhattan. And it just imagined all the kinds of roads you can take logically. And you can go through a one way street in the opposite direction. It’s giving you all the good right sequences.
And it can do it with your behavior. It can say, A, you clicked. You clicked on a site. E, you looked around at the description. D, you looked at the price. C, you purchased it. F could be it’s being manufactured. It’s being conditioned. So you can replace it by any kind of sequence you want. So what is the advantage of this? The advantage of this is we’re going to go to step two and I’ll show you the slide after. Let’s stick to the code while we’re in it. What do we do with those sequences? We’re going to bring them to the transformer.
We’re going to bring them to the transformer. And what is the transformer going to do? It’s going to pick up all these logical sequences and it’s going to do something fantastic. It’s going to learn them. It’s going to learn to reason like us, and I used what we call a Roberta model of a transformer, but it’s basically using all these sequences. Here are the heads of this V8 engine. I was talking about, this one’s a bit bigger in fact. Okay. So then what is this for? What’s happening here. See, I’m giving it a sequence. It’s learned the sequences. Like we mentioned, learned them all by heart and it’s trying to give another sequence behind it, and saying, if I say AEDC, will it go to, you can go to BDC means maybe if I click on a site, I look at the description, look at the price.
I purchase it. Now it goes into manufacturing. Now it goes into conditioning and transport. So now it can predict huge amounts of decision-making sequences that it learned all by itself, by just giving it the basic rules. So now if we go back to our little PowerPoint here, okay. We can go back to where we were here and we can see that the sequences we generated there for this example was, well, the order comes in, then it is cut into the fabric is cut into the little sizes, and then it has to go on a conveyor belt. And then what happens is you have to give it to the fastest sewing department, so you can get it out as fast as possible. Okay. And then we showed in this little step, how these sequences went into the transformer and learned them. So now the system is very, very intelligent.
So now we can move on to step three, and look what’s happening. And then you can, we’re getting to our final results. So what we have here is a program with, I admit this one’s not an easy one. This one’s pretty long. It’s in my book. It’s artificial intelligence. By example, you have some in hands-on explainable AI, and some in transformers. You just have to figure it go. So what it does here, I’ll just show you the example. Let’s dive into the example. There’s a convolutional neural network, that is with a webcam, and it’s looking at the conveyor belt. And what are we looking at? We’re looking at all of these cut packs of pieces. And we want to decide which sewing department want to give it to station.
So go faster. So right here, it says, well, if I look at that image, the fastest way to do it is to I’m looking at all these sewing stations. I th I think he is the best one. I think he is the east. So right here, you see sewing stations. And you can see that E now is loaded. So what we’re trying to do is load balancing between these sewing stations. But you can imagine anything in the world you want to balance the load. Servers, warehouses, but here it’s the sewing stations. And then it says, okay, let’s take another look at what’s going on on the conveyor belt. And then it will go on like that. Then now here, it’s saying, well, now I’m looking at this one here. Where should I put it? Well, we can put it on E again.
Let’s go ahead. And then on and on, this goes on 24 hours a day in corporations, like Amazon. And they’re doing this for every single product, right? See, now it says let’s go to A. A is the best one. So A could be the best warehouse, the best sewing station, the best website, whatever you want, but you’re in the world of recommenders. Now, if we go back to where we were, well, then now we reach the time where I summarize this whole thing.
Now, what am I, what am I saying? I’m saying 20,000 years ago, we were trying to optimize distances and information. What I’m saying is 300 years ago, someone, Louis the 16th, came up and say, we can’t go on like that with manufacturing and delivering to the people taking that long. 300 years after, we’re sitting here and saying, why are the vaccines not in the warehouses? And what is the solution? The solution is right what I’ve been showing you, it’s not deployed enough, we’re going right now. Digital highways have to go much faster. So the future is what I just showed you. You have to take these sequences, put them into transformers, and then recommend the fastest way, balancing the loads to get the service of the product from its initial place, to us. So that way you’ve optimized your transportation, optimal transport. So now I’m available for questions and thanks for listening to the presentation up through there.

Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris Diderot University, designing one of the very first word2matrix patented embedding and vectorizing systems. He began his career authoring on...
Read more