data brew logo

EPISODE 5

ML Applications

Good machine learning starts with high quality data. Irina Malkova shares her experience managing and ensuring high-fidelity data, developing custom metrics to satisfy business needs, and discusses how to improve internal decision making processes.

Irina Malkova
Irina Malkova is the VP of Data Science Applications at Salesforce, and joins us to share the good, the bad, and the ugly, behind building ML applications. With her extensive background in driving business value with ML, she discusses the challenges one faces with security, scale, data quality, and the process to define effective metrics.

Video Transcript

The Beans, Pre-Brewing

Denny Lee (00:07):
Welcome to Data Brew by Databricks with Denny and Brooke. This series allows us to explore various topics in the data and AI community. Whether we’re talking about data engineering or data science, we’ll interview subject matter experts to dive deeper into these topics. And while we’re at it, please enjoy your morning brew. My name is Denny Lee, and I’m a developer advocate here at Databricks.

Brooke Wenig (00:29):
Hello, everyone. My name is Brooke Wenig, and I’m the other co-host of Data Brew. I’m the machine learning practice lead at Databricks. And today I have the pleasure of introducing Irina Malkova, VP of Data Science Applications at Salesforce, to Data Brew. Irina, how about we kick it off with a quick introduction of yourself?

Irina Malkova (00:45):
Sure. Hi, everyone. Hi, Denny. Hi, Brooke. Thank you so much, guys, for having me. So, like Brooke said, I lead the team called Data Science Applications. It’s a really cool little team at Salesforce, whose goal is to develop ML apps for our internal stakeholders to help them make better decisions. So stakeholders would be product managers, or customer success, or sales people, and the decisions have to do with making our customers successful. Yeah, happy to be here. Excited.

Brooke Wenig (01:12):
I love the customer obsession. Before we jump into some of the use cases that you’re working on at Salesforce, how about we take a step back and talk about how did you get into this field of machine learning?

Irina Malkova (01:22):
Awesome. Thank you so much for this question. It brings me all the way back down the memory lane. So, I feel like I got into machine learning because what I really, really, really care about is really great decision-making. So I started my career as a consultant at McKinsey back in Russia in 2008, a long time ago, and it was all about helping executives make better decisions. It’s just the tools that we needed to help them make those better decisions, at that point were 2×2 matrices, because there was… Decisions were not totally complicated, and there was not a dramatic amount of data that was needed to truly help them. And it was not until I got into tech after coming to the United States to get my MBA at Stanford in 2011. So not after I got to the United States and started working in tech, I actually helped them start leveraging statistical methods and machine learning to truly help with decision-making.

Irina Malkova (02:22):
And then I started off working at the couple of startups and my very, very first job was at the startup that made a really cool calendar, Cold Tempo. It has been since acquired by Salesforce. I was helping them set up their product analytics and help their PMs make better decisions about how their user behaved. And eventually I ended up at Salesforce and there’s absolutely nothing more complex than decision-making and enterprise, in my humble opinion, because our customers are all dramatically complex. There are many, many, many different use cases. The way they use Salesforce is dramatically complex. Every customer would see something different on their screen, just because of how customizable the tool is. And the way we sell is really complex. There’s so many different roles. And because of all of that complexity, human minds contextually process, all of this data to make the right decision. So it’s very, very ripe for machine learning. So that’s how I’m here. And it’s definitely been very, very exciting. And like I said, my team is developing apps that are supporting that decision making for internal stakeholders.

Denny Lee (03:35):
Oh, this is awesome. And we can definitely hear from your excitement here, all the cool things you’re trying to do. So then let’s just dive right into it. Can you tell a little bit, maybe, some of the use cases that your team actually gets to work on, to help with that decision-making

Irina Malkova (03:49):
Yeah, absolutely. So the enterprise use cases are relatively standard in terms of what enterprise companies need, which is actually one of the really cool thing about this job, because some of the things that we have built for our internal stakeholders eventually made it to Salesforce product. Because that’s something… If our internal stakeholders need it, chances are our customers need it too, just because in enterprise, this use cases are pretty repeatable. So one example would be all of the different use cases around helping salespeople sell better. You can only imagine the complexity of selling something as massive as an enterprise platform like Salesforce, right? There’s many, many tools. There is Salesforce for sales and for service people and for marketing, et cetera, et cetera. So it’s really, really hard for a typical sales person to know which company to go after and what to offer and how much money they think they’re going to make.

Irina Malkova (04:49):
So there’s all this use cases around forecasting, the revenue and suggesting next best product, et cetera. So one of the tools in my portfolio is called Einstein Guidance. It was a really, really accurate forecasting tool that would just go into a very, very depths of the hierarchy of our sales leaders and products, et cetera, et cetera, and give you a very precise number of how much you’re going to make. And Mark actually talks about this tool a lot. He calls it Einstein Attending My Sales Meetings and then calling out sales leaders who say that this is not the number by saying that it is the number and data told you so. So that has been a really exciting product. And actually that has been implemented into a product for our customers. It’s called Einstein Forecasting. So that’s one example.

Irina Malkova (05:43):
The other set of products that my team has is around helping our customer success organization, help our customers deploy the product better. So one of our big flagship products is something called net adoption score, which is basically our attempt to let our success organization know that something is wrong with how customers is adopting the product and hence customer potentially is at risk of not getting the value of what is that that they’re paying for. So at the heart of it is basically a clustering tool that categorizes all customers into groups in terms of what kind of adoption challenge they’re facing, and then sources that information to the customer success people, often before customers themselves even know that they have that challenge. So our customers success people can take action.

Brooke Wenig (06:34):
So when you’re taking these internal products that you’re developing and trying to productionize them and create a product that your end customers can consume, what are some of the challenges that you run into?

Irina Malkova (06:44):
Yeah, so we were recently putting together a presentation about some of the history of how we build those products for our internal customers. And it started with a meme. It went like this: the first part of the meme was expectation of how to build machine learning products in production, train the model, deploy the model, and then Tony Stark’s standing in the field with all the money flying around. I don’t know if you guys have ever seen that picture.

Denny Lee (07:14):
Yes, definitely.

Irina Malkova (07:18):
And then reality of deploying a production model: is train the model, deploy the model, and then many, many, many steps of how things are not working. So, yeah, there’s definitely lots of challenges that have to do with deploying production models. Even if we’re only talking about internal stakeholders like ours, it usually takes us about two weeks to come up with a good, decent prototype of something that we want to build that is complete shock and awe and stakeholders love it. And then it can take us up to another year to come up with a production product that truly can reflect the requirements and truly serve many, many users at scale. So lots of challenges there. A big one is definitely data variability. I would imagine every machine learning practitioner knows about this. For Salesforce, that’s especially hard because most of our machine learning products have to operate at the company level.

Irina Malkova (08:19):
So for example, if we’re helping our customer success people help customers adapt better. We usually are not talking about one product. We’re talking about the whole suite of products of Salesforce that customers bought. Well. In fact, those products usually sit on different tech stacks. So they’re completely different instrumentation frameworks and a completely different infrastructure of how these metrics are collected and aggregated, and different people responsible for doing that. So for us to build a machine learning model that looks at all this data, we need to have a very robust governance process and very robust data quality checks on the data that is coming in. And very, very robust metric discovery feature store and understanding of basically, what data is coming in, because there’s just too many metrics and too many producers to basically hope and pray that the data is not broken. It definitely has to be an institutionalized process for the ML app to work. So that would be one challenge.

Irina Malkova (09:23):
There’s definitely lots of other ones. I would say another challenge that I found extremely interesting is the challenge of setting the mathematical problem that would address the business need and at their price that often has to do with the fact that there is rarely a target metric that is easily measurable, that we can use to build something useful. So for example, if we’re looking for good adoption or for example, a smooth operating Salesforce instance, there is rarely one number, like monthly active users or lack of errors that explains the good outcome. Well, there very often needs to be a very dedicated, almost like a BI work or analytics work on first creating that metric that can serve as a target. Because there is many different types of ways people use Salesforce and many different tech stacks and many different types of, I don’t know, errors the customer can experience.

Irina Malkova (10:33):
So creating that target metric is very, very important, again, obviously because otherwise, how are we going to build them a model? So we collaborate very closely with our analytics team that specializes on thinking deeply about the philosophy of how to measure different things and creating that data for us.

Brooke Wenig (10:52):
So I know Denny’s going to want to ask you more about defining these metrics, but I want to go and dive a little bit into the data quality. How do you ensure that you have high data quality? Are there any tools that you use like Great Expectations or do you do everything homegrown, in-house to evaluate your data quality and ensure everything’s high fidelity?

Irina Malkova (11:10):
Yeah, that’s a fantastic question. So I actually, before Salesforce, I used to work at the data governance company called Alation. That makes a daily catalog. That was a fantastic ride. And that was actually how I joined Salesforce. I was working on the sales deal with them, as part of Alation team. And I loved the team so much that I was like, “Okay, can’t leave you guys, I got to stay.” Yeah, so there’s multiple components to it. So Alation is our current data catalyst solution. So it basically allows us to, at the very least, know what do we even have? And there is multiple levels to knowing what are those assets that we have? So there are hard assets like datasets and the reports. And then there is a higher level conceptual assets, like business metrics, which Alation also allows to record.

Irina Malkova (12:07):
And then data qualities is a separate track. Alation allows us to know what we have, but it doesn’t really tell us whether it’s good or bad. So in terms of data quality, I’m a little bit out of my depth here, but we do have an amazing information management team and they introduced this concept of data contract. Where you basically, as a data producer, as a data consumer, you handshake on the contract that says like, “Hey, here’s what data is. And how it’s supposed to look like. And the null values, and with the potential SLA on when the data is coming through, et cetera, et cetera.” And then around that data contract, there are services that basically help to validate it. So many, many smart people, many, many data contracts. There’s people and a product element to that as well. There was always has to be a data producer, who can investigate very quickly a data issue if it is escalated to them. So again, with all of that, this is how we operate quality.

Denny Lee (13:13):
Wow. That’s really cool. It it’s… Well, we’re not going to dive too much into it. That’s sort of cool that you’re calling out this concept of a data contract to ensure data reliability, because it goes back to the usual thing, unless you’ve got decent enough data, your machine learning is going to tell you much, is it?

Irina Malkova (13:30):
Yeah. [crosstalk 00:13:33] takes 90% of the time.

Denny Lee (13:34):
Exactly, exactly. Okay. So, but hey, I do want to definitely focus on the metrics part. You mentioned that net adoption score, that score that you created, right? How often do you need to change that score or do you change the metric itself or its definitions? Do you have to do that? Or have you been able to solidify it? And for that matter, maybe even a little historical context of how you came to decide that you needed to create this in the first place.

Irina Malkova (14:04):
Awesome question. Thanks again for taking me down the memory lane.

Denny Lee (14:08):
Oh, sorry about that.

Irina Malkova (14:10):
Oh no, that’s serious. That’s one of the fun memories. So we used to do product… So I’ll start with how we created that adoption score because I think it’s pretty informative. So we used to do product analytics and product data science in the same way as many consumer companies would, where we had pretty strong expertise within each individual pillar. So for example, within the CRM product or within the marketing product, all the metrics existed within each product. And the metrics overall. So for example, for CRM, CRM is a product that is sold using user licenses. So it stands to reason that most important metric is active usage monthly and daily because if somebody bought a license, it stands to reason that they probably want them to use it. And everything trickles down from that.

Irina Malkova (15:02):
For marketing cloud, very, very different set of metrics because we don’t sell marketing cloud by per user. It’s more a big bundle of emails and text messages, et cetera that you get send. And then companies consume against that. And it doesn’t really matter how many users they have using actual platform, the marketing people, right? It’s more about how many emails they sent, et cetera. So none of that adoption score started when our new, then chief product officer Brett Taylor, who since then became COO, he was a really, really cool person. He was the founder of Quip that Salesforce acquired. So he came in and he was like, “Can you guys show me the picture of adoption across all of the products?” And we started putting together a giant table, and then we realized that it’s absolutely unreasonable because you can’t really take action on this. How are you going to compare number of emails and text to number of active users to, I don’t know, number of dollars that our commerce clients spent on the website.

Irina Malkova (16:05):
So because of that, we needed a metric that elevates a way from raw metrics and basically abstracts away to the place where we’re able to just say, this is good adoption, this is bad adoption. And this is how it looks like for this and their product. So in terms of them… How an adoption score works, it definitely relies on all those metrics because that’s the heart of it, right? So we need to understand, in detail, adoption by product and with all of this technical specifics, to be able to make inference about whether it’s good or bad. So metrics are still there. But the trick was to, again, work with our analytics team, to philosophically align on the metrics that are fundamental, it’s so important that they’re not going to change a lot. So most of those metrics have to do with fundamental things like business model and the absolute crucial functionality, and a little bit less of our feature of the day, most important functionality of this release and things like that. We still use those signals, but they’re not at the very, very hard of the model itself.

Irina Malkova (17:12):
And I would say that the biggest work that we do in terms of adding metrics and refreshing metrics in that adaption score has to do with inching towards not just measuring how customers adopt, but how are they getting value out of Salesforce, which is not always the same thing. Because one is necessary, but not sufficient. So we know that if customers don’t use the product at all, then they’re not getting value, but then if they do use the product, they might not get the value either, right? It could be that they’re using the product in a very inefficient way.

Irina Malkova (17:51):
So we’re just trying to constantly up level the way we think about adoption to get closer and closer to value. So the current effort has to do with looking, not just at the feature adoption, but the adoption of the entire job. So customers use Salesforce to complete a certain job. So if they, for example, use a specific feature, it doesn’t really mean that they complete the entire job because this features probably needs to be used in conjunction with other features. So we are right now, switching from metrics that have to do with feature adoption, to metric that have to do with the job adoption. That would be one example of something we do.

Denny Lee (18:30):
Got it. It actually almost reminds me of this concept of path analysis, like the web basis, where in, other words, did they actually click onto the add to shopping cart or actually purchased from the shopping cart and the 15 steps that they did, as opposed to just simply say, “Ooh, they looked at the product.” That doesn’t necessarily mean anything. So, right. So then I guess what I’m curious about is that then do you feel that… It’s not like this process ever ends, right? This seems to be always every month or every year, or however often, this process, even though maybe the net adoption score itself doesn’t change. It seems like everything around it does. Is that accurate? Or. I’m just curious.

Irina Malkova (19:13):
I would say it improves that’s my hope. It doesn’t change just because we got bored with the old metric for sure. But yeah, and I don’t think even the jobs and path analysis would be the end of the road. What we’re really, really, really trying to help customer with is truly get more successful with Salesforce. So if you bought Salesforce to sell better, are you actually selling better. Are you converting more deals? Are you closing larger deals? Are you doing it faster with fewer sales people, et cetera?

Irina Malkova (19:43):
So the Holy Grail is to know that. But again, those things are very hard to measure, partly because that’s proprietary customer data and we don’t see that. So we’re just trying to get closer and closer to that notion. I would say that every iteration of new metrics is probably where we’re talking multiple releases. So it’s a year or more than a year. So it’s not like the scientists are shocked by a new data set every day. But yeah. So, the hope is to get to that ultimate Holy Grail of knowing are we helping the customer or not.

Brooke Wenig (20:19):
Fantastic. And I want to switch gears a little bit into the decision making process. Because I know that you’re very much focused on internal decision making and building apps for internal stakeholders. What would you say are some of the key differences when building machine learning solutions for internal stakeholders versus external ones? Are there additional things that you need to consider or things that you don’t need to consider? I would just love to hear your thoughts on this.

Irina Malkova (20:41):
That that is a fantastic question. In general, there is a big difference between the internal and external facing products in terms of what they are for. So the absolute biggest requirement for external facing ML products is ability to accommodate many, many different use cases. So we want it to be generalizable. So for example, for Einstein Forecasting, the tool helps you forecast whoever you are. You can be a company that has seasonality. You could be a company that has a lot of gen deals that just disrupt the whole sales flow. You could be a company that is very little, you can be a company that’s really large. It’s supposed to work for you. And that is a really, really complex thing to solve.

Irina Malkova (21:34):
So when our external facing Einstein team takes some kind of concept, they usually completely rebuild the product with that in mind first. I would say that is the biggest difference between internal facing and external facing. But even within internal facing, I think another difference is that type of stakeholders that we work with. So very often our stakeholders are very senior people who are interested in explainability much more than, than their accommodation. Which is probably not the case with external facing because our typical user is your general sales person who just wants to be told what to do, like go sell this opportunity or focus on this today, try this product, et cetera.

Irina Malkova (22:20):
Our stakeholders, often are architects that want to understand in depth why we’re assigned a certain category of risk rather than being just the like, “Hey, go speak to this customer.” So explainability has been a gigantic business requirement for us alongside with accuracy. Accuracy is also extremely important, more important than it would be for external products. So, yeah. And it has really informed the design of how we build products. Sometimes there are entire separate model just for explainability, just because it is such a gigantic big requirement.

Brooke Wenig (22:58):
So actually on the topic of requirements gathering, how different is it for working with internal stakeholders, and you might have many people involved with the requirements definition versus working with customers where there’s both internal and external stakeholders? How would you say the processes differ between the two?

Irina Malkova (23:16):
Frankly, haven’t worked with our external customers. I did a little bit of that when I was at Alation because I led the customer success function there and we worked with our PMs to help them build requirements for our external customers. So the process there had to do with just being really diligent about aggregating every older request that come through customer support and customer success, and just really making sure that the customers said something really valuable to you while they’re complaining about something not working, do not forget to write it down and give to product manager. So that was the way you do it with external customers.

Irina Malkova (23:57):
With internal customers it’s awesome. It’s nice because they’re right there with you in the same company and you can co-create products together. So there’s a lot of people who are extremely technical and extremely excited about our mission, who are just willing to not go through Python Notebooks and CSVs, and they don’t need a nice UI to collaborate with us. Very often, a senior customer success colleague would be like, “I’m a rocket scientist by education. Please, please work with me. I really want to work with you guys.” So there’s a lot of excitement about data science in the company. So we definitely have no shortage of awesome people who want to work with us.

Denny Lee (24:40):
That’s awesome. Actually, this naturally leads to the question then how do you, especially as a manager, right. How do you keep up with these latest advancements? You’ve got your rocket scientist here, you’ve got your data analysts there, but then how do you keep on top of all the advancements in machine learning and deep learning, so that way you can still be helpful to everybody?

Irina Malkova (25:01):
Yeah, also thank you for this question. I personally believe that even as a manager, I got to do some things with my hands. Otherwise I’m going to get the wrong one. And I know it’s a pretty unpopular opinion. When I was a young analyst on many, many past jobs ago, lots of managers told me that you should not be doing things as a manager yourself, but I just feel like this industry is changing so rapidly that if you’re not being at least a little bit hands-on you get completely out of touch, very, very, very quickly. So my team’s culture is kind of like that. So my boss, for example, spent his Thanksgiving vacation making his drone recognize cat from dog. It was really nice.

Irina Malkova (25:54):
I, up until pretty recently, I also did some prototypes myself. My background is a little bit more over to product management rather than data science. So I still do a lot of little productive tasks, like build a little prototypes and just play with things. So I just generally believe that being hands-on is really important. I definitely don’t have enough time to be hands-on to try all these different methodologies, so I heavily rely on the team to post interesting links. And we have a lot of internal meetings where people just summarize stuff that they’re learning. But again, really believe that if I stopped doing things with my hands completely, I will not be able to parse through all the smart things that they’re saying anymore. And I will become really irrelevant.

Denny Lee (26:41):
I actually completely understand your feeling. I had a past boss that basically was talking about that because he was actually like at the CXO level. So he basically said, “Yeah, the problem with people like us is that they’ve put the handcuffs on, so we’re not even allowed to touch even the keyboard anymore.” Right. And so I actually understand. I’m sort of like you, from that standpoint, I’ve been fighting that as long as I possibly can to prevent myself from not being able to write code. So keep the good fight. Exactly. Break those handcuffs. Okay. But actually, in all seriousness, I did have a question in terms of, especially because coming from your perspectives, do you have any recommendations on how you use machine learning to drive business value? I mean, that’s what you’re doing all the time. So for anybody who’s listening to this vidcast podcast, do you have any recommendations on how other people can do the same thing that you’re doing right now?

Irina Malkova (27:35):
Yeah. One thing that was extremely important for us is to get disciplined about measuring the outcomes of the decisions that our machine learning labs are driving. And then be extremely cognizant of how that compares to the cost of maintaining those applications. And the trap that I often see teams fall into is they A, ignore the cost completely and B, they think that if their tool delivers some incremental value compared to absolutely nothing or an Excel spreadsheet, then that’s great. And very often they will build a number of applications and don’t really rationalize the portfolio ever. And in the end they’re stuck maintaining a whole bunch of apps that they’re not really even all that useful. So we really are pushing to be disciplined about getting the feedback from stakeholders and then constantly trying to connect the outcomes back to the contribution from our tools.

Irina Malkova (28:36):
It’s not easy at all because we’re in the decision making business. And we’re just only one output to making a decision. In the end, the human is the one responsible for taking action and it’s pretty hard to attribute the success to the tool specifically. But nevertheless, I do think it’s really, really important to work to keep connecting our output with our ultimate customer success. And again, the notion of being very careful about the ROI of the app and going through your portfolio, seeing what doesn’t really create value anymore, what should we do with it, knowing exactly how much maintenance takes and being pretty disciplined about your project management, right. And knowing how much time are you spending on KLO and how expensive it is for you to maintain those production apps? I think all of that is extremely important to maintain a healthy portfolio of machine learning apps.

Brooke Wenig (29:29):
Yeah. Discipline is definitely a super important trait for any machine learning organization or team. I’m just curious, how do you decide the trade off between spending your development time on building a new versus maintaining an existing one? How much time does it take to maintain something once you’ve already deployed it?

Irina Malkova (29:48):
That’s a fantastic question. Too much. I’m sure we all would rather be building new features. That’s for sure. Yeah. This year we’re even creating, finally, a productivity operations org within our team that can actually focus on decreasing the cost of maintenance. So the apps that I mentioned, Einstein Guidance and adoption score, they’re just some of the larger ones. We have a whole bunch of smaller apps, in total probably a dozen or maybe a dozen and a half. That at this point there needs to be standardization in order for us to reduce the maintenance costs. So we’re looking to invest into better DQ tools and better deployment tools and monitoring tools to drive that cost down. Right now, we still have a lot of time to develop new features, but the data scientist are not happy. There’s definitely way more maintenance then there was five years ago. So definitely something that we want to keep the degrading.

Brooke Wenig (30:59):
Well, definitely let us know once you figure out what those magical tools are, we definitely want them too.

Irina Malkova (31:05):
I can only imagine.

Brooke Wenig (31:07):
And I guess that makes sense that you have more maintenance now than you did five years ago, because you probably have more use cases now than you did five years ago.

Irina Malkova (31:14):
Yeah, that’s exactly the case. And also somebody showed me this chart about how technical data always grows, and then you have to cut a dramatically or it’s just going to choke you completely. So I think we’re right there. We’re at the point we need to cut our data mileages because we had this amazing few years of expansion and creating new apps. There is always this trade-off between deploying the app very, very quickly versus truly, deeply thinking how it belongs into the architecture of other apps and doing it in a beautiful and efficient way.

Irina Malkova (31:48):
So, for example, right now we have quite a number of feature pipelines that overlap. So we do the same feature engineering work multiple times across different models, which is no reason to do that. It’s also not very beautifully wired in terms of where the feature is engineered. So it’s not always engineered at the very beginning, sometimes it’s re-engineered multiple times in different models. So, lists things that could have been better. So, yeah, this is definitely a focus for our team for the next couple of years.

Brooke Wenig (32:20):
Well, I just want to say thank you so much for joining us today on Data Brew and sharing all of your thoughts about how you can use machine learning to drive business value, how you can align among all of your stakeholders and how you can successfully develop ML applications. So thank you again, Irina for joining us today.

Irina Malkova (32:32):
Thank you. Broke and Denny. So nice to meet you.