JUNE 27-30

REGISTER NOW
Champions
of Data + AI

Data leaders powering data-driven innovation

EPISODE 17

Solving Business Problems With Data

Data teams can sometimes get too focused on feature engineering and model performance without fully understanding the use case context and its business impact. In this episode, Sanjeeva Fernando, Senior Vice President at Optum, explores ways data and technology leaders can keep their data teams focused on delivering business value by working backward and deconstructing the use cases.

headshot
Sanjeeva Fernando
Senior Vice President, AI Products and Platforms, Optum Labs
As Senior Vice President of Artificial Intelligence (AI) Products and Platforms at Optum, Sanjeeva Fernando is responsible for the design and development of leading-edge AI models and analytics for the enterprise, as well as the platforms needed to accelerate the development and deployment of these solutions.
Previously, Fernando led the OptumLabs Center for Applied Data Science (CADS). The CADS team applied breakthroughs in AI and machine learning to solve complex health care challenges for UnitedHealth Group (UHG) by developing and deploying new software product concepts.
Prior to joining Optum in 2014, Fernando worked at Nokia, where he created their first data science team. Before that, Fernando spent nine years at Nokia in various corporate roles with Nokia’s Multimedia Division, Nokia Research Center and Nokia Ventures. Fernando was also a co-founder and VP of Engineering at Vettro, a venture-backed mobile software company. Fernando began his career in consulting with Viant and Accenture.
Fernando is a graduate of Trinity College with a bachelor’s degree in computer science. He lives in the Boston area with his wife and their three boys. In his free time, Fernando enjoys coaching his sons in basketball and baseball.

Read Interview

Chris D’Agostino:
Welcome to the Champions of Data and AI. I’m your host, Chris D’Agostino. Why is it important for data teams to focus on use cases and less on the models? That’s the core of today’s discussion. Data teams can sometimes get too focused on feature engineering and model performance without fully understanding the context of the use case and its business impact. In this episode, I’m joined by Sanji Fernando, Senior Vice President at Optum, to explore ways data and technology leaders can keep their data teams focused on delivering business value by working backwards and decomposing. The use cases let’s get started. So welcome Sanji to Champions of Data and AI.

Sanji Fernando:
Thanks, it’s great to be here.

Chris D’Agostino:
Yeah. So when we last talked, we were talking about what life was like sort of this past year and some of the challenges of working from home and having kids and wanting to make sure that they remain productive and safe. Can you share a story about how you might have had to intercept or interject yourself in that mode?

Sanji Fernando:
Yeah. No I think we’ve all gone through lots of change and I was just reminded, when we spoke, of years ago, maybe five or six years ago, when we weren’t really working from home as much. And we got stuck at home because of a big snowstorm and my wife and I both were working, and we’re trying to jump from call to call like so many people probably did in the last year and trying to figure out how to keep the kids busy. And so we finally just said, “Hey, just go on and clean the snow, grab a slab, we’ll bun you up”. My wife and I had finally gotten to the point where we just had to be on some conference calls and we let them loose.

Sanji Fernando:
Unfortunately they’re boys and they’re my boys, so that’s the problem. But somehow they got into the idea that they’re going to sled off the neighbor’s garage, there was so much snow they were able to get themselves onto the top of the garage and shoot themselves off. And so after a couple hours, things seem to be going well, we’re doing our calls, we’re catching up on work. And then we get this urgent knocking the door, my son shows up there’s blood everywhere, he’s torn up his knee. And my poor wife is trying to figure out how to pull herself out of a presentation. Like, I cannot present anymore because my son just sledded himself up a garage. So that was pretty funny. And funny now, like we were all very worried about him but a couple stitches later, we could all laugh about it.

Chris D’Agostino:
Yeah. And that reminds me of I was a kid sledding one time on a two person sled with a buddy of mine and he was in the rear seat of the sled and he had the controls for steering and we were going down this massive hill. I felt like we were losing control and I was like, yelling, “we’re going to crash”. He’s like, “no, we’re fine”. And I, at the last minute, just jumped off the sled and he slammed into a tree and broke his leg. So he’s never… So Calvin Wilson, if you’re out there somewhere and you hear this, forgive me for letting you break your leg instead of mine.

Chris D’Agostino:
But this is probably a good example of maybe talking about data science and talking about teams and your role as a leader in making sure that they’ve got all the information they can to build models properly and understand maybe the full business use case around that. So talk a little bit about how you’ve been able to take your leadership skills from a mentoring standpoint and just help the teams understand sort of the full breadth of use case design when it comes to modeling.

Sanji Fernando:
Yeah. There’s a lot that goes into it and we try our best and we have some amazing folks that work here at UnitedHealth Group. Ready well trained and they really are the experts on understanding how to apply these really amazing methods to understand data, to extract meeting and insight. But what I always coach folks is to really understand the end in business problem and really work their way back from, quote unquote, the customer or the user. Oftentimes in machine learning and artificial intelligence, really well credentialed folks come from more academic settings or often require lots of academic training to be able to be successful in this space. And oftentimes it’s easy to fall in love with the data we have to extrapolate and drive inference, but you got to really understand what the problem you’re trying to solve is. Yes, if I had all the data to solve a problem like disease prediction, which we don’t do yet, how does that actually work in the real world?

Sanji Fernando:
How does that get to a physician? How does that get to a nurse practitioner? How does that get to even a patient to understand where their disease might be taking them? And so, yeah, I could train a model on everything that happened to a person, but the decision making for them to course correct, to make a decision about their help may not have the benefit of all the data. You might be at the front end of the problem or you might be able to try to present something. And so, yeah, your metrics, your performance, your area in the receiver operator curve or precision recall, might look great based on the day you trained, but did you think about what data you have available at that point in time when the inference is needed?

Sanji Fernando:
And so we spent a lot of time talking about that a lot of time, understanding that. And to be honest, coming from healthcare, sometimes it’s hard to figure out who the customer is because you and I as patients should always be put first, but there are also providers who need to be able to serve us really well. And there are payers who are trying to make sure that we get the highest quality care, but at affordable price. And so sometimes it could be a little hard to navigate and understand how to optimize for all of those important needs.

Chris D’Agostino:
Yeah. And so it seems to me you’ve determined that just looking at the data itself and the data that’s been elected, isn’t the full picture. Did you have any kind of breakthrough moment where taking that…? We oftentimes refer to a 360 degree view of a customer, and as you said, kind of this is interchangeable, in a healthcare environment, you’ve got the patient, you’ve got the provider. So there are more parties involved. Did you have some kind of breakthrough moment where you said, you know what, we’re looking at only a subset of the data and while we might see good model performance using that data, we’re missing sort of the bigger picture here?

Sanji Fernando:
Yeah. We’ve worked with safely de-identified data for years in healthcare, and there’s important controls and governance that are defined by HIPAA that allows us to create these de-identified data assets. And it’s a great place for us to learn and test new methods. But sometimes we fall in love with those methods and we say, “well, look how well this is performing, we should write a paper about it, this looks really fantastic”. But in the course of that, we also asked ourselves, all right, this is great, now how do we get this into the hands of people who could use it day to day?

Sanji Fernando:
And all of a sudden we realized our models don’t work as well because the data changed, the availability of the data has changed, the information density changed, the specificity changed. And so it was just a good wake up call to say, hey, look, theory versus practice almost is probably at the heart of it. It’s really understanding what business problem you’re solving in that context for that person is so important. And so we learned from that and now we really try to start from that business context. Like, let’s start from the problem first rather than sort of the theory first.

Chris D’Agostino:
And is there a process that you and your team go through in terms of decomposition? So you’ve got kind of this macro use case, if you will, and you want to be sure to understand sort of the key touch points along a timeline of a patient presenting with some kind of symptoms and then the treatment, the assessment by the physician and what he or she determines. And then you weave in additional tests and things like that, and then now you create a treatment plan. Do you go through an exercise of breaking things down along that kind of continuum?

Sanji Fernando:
Absolutely, and we try to use some constructs to help us get to those problems. In the past, we’ve leveraged things like derivatives of something called the Business Model Canvas, which have nothing to do with data and have nothing to do with machine learning, but it starts to help us hone in on what piece of the puzzle, what problem do we want to solve? The problems can be really large in our industry and I’m sure in others too. And so decomposing and understanding and having a well articulated problem statement is incredibly important to us. We’ve really liked the canvas because it forces us to ask a question, what is the business problem that we’re solving? But there are other contracts as well, like understanding where we want to go is also important. One of the first steps might be to test and validate certain aspects of the solution.

Sanji Fernando:
But you also want that north star, where do we want to go? What kind of business transformation do we want to achieve? We use press release, FAQ documents, today to help articulate things like that. So all these constructs help us get to the right problem. And if we do that really well then trying to solve that with data and methods like machine learning becomes so much easier, because we know we’re working on the right problem domain, and understand the limitations and constraints of the data and the processing that have to be in consideration.

Chris D’Agostino:
So we hear a lot in industry when we talk about use case definition and companies that are really trying to move along this data and AI maturity curve to get to more prescriptive analytics and try and drive behaviors that the organization wants, that there’s, we talk about data adjacency. And so if you use case A and it’s got a set of data, then you try to find other use cases that would leverage that set of data because you’ve already done the data wrangling, you’ve figured out where the data comes in from the source systems, you’ve done the data curation steps. You’ve tried to make it more consumable and be able to leverage it to do model training. Do you take that approach as well in terms of finding companion use cases and is that part of the process of teeing these things up?

Sanji Fernando:
Yeah, absolutely. And it may not be plug and play sometimes, if it is plug and play, that’s great, we’ll take the win. But we are seeing a lot of benefit from common representations of language and information that might have been trained for one use case, that we could transfer to another use case. And that’s been incredibly important, not only to apply our investment in that asset of representing the data and the language in one use case, applying a general use case, we started reusing the data pipelines and the training models. And all of a sudden, what we’ve started to observe is that both solutions get a little bit better because they’re both starting to perform better because we’re presenting more information to the model over time. Or as material changes happen and how the business process works, both are benefiting from that commonality across the pipelines.

Chris D’Agostino:
So Sanji, one of the things is that we hear about data science is the big sort of that 80/20 rule where 80% of the data science community, their time is spent doing data wrangling, and 20% on doing the algorithm development. Is that something that you’re seeing in your organization as well? Is that a challenge for you and your team? And if so, how much does the data architecture and the platforms that your teams are building assist in that or not assist in that?

Sanji Fernando:
Yeah, it’s a great question. And we’ve been on this sort of machine learning journey for a few years now. But when we began, it’s been challenging, in part because the systems that generate the data we need to learn from, were never designed for that kind of use case. Those legacy systems perform very well and are optimized for what they do today, like process claims and things like that, but they weren’t contemplated to support machine learning inference. So it’s been a challenge, and I know you guys worked with lots of companies. We’re always trying to learn and hear how others are trying to marry these and balance these different system objectives, because it’s hard. It’d be great to hear what others do.

Chris D’Agostino:
Yeah. So we do talk with a lot of companies from a lot of different verticals and I think there’s a couple of themes. The first is, data that’s sits inside these source systems, what’s the best way to extract that data and make it more usable for cross system, cross business unit analytics, right? And so some companies have applied analytics at the system of record, but then oftentimes that’s not enough, either they can’t do the style of analytics that they want, so they need to pull the data out. There is a big movement, as I’m sure you’ve heard, around data mesh, which is really kind of 50% principles and policy around creating a producer consumer model and making sure the producers are responsible for the data and the quality of that data.

Chris D’Agostino:
And then to the degree that the source system can actually serve up data products, then that’s great. But many of the systems today aren’t really designed to be able to do that. So organizations are pulling data into something like, say our lakehouse architecture for example, where you can consolidate data holdings from across a bunch of different systems. And then you’ve got one place to go to provide access to the data, and the wrangling piece is now done and the data science community can really focus their energy on just doing all the feature engineering from there.

Sanji Fernando:
Yeah. That makes sense. I think I know we’re headed there too. With four years of experience in data wrangling, we recognize that not automating those steps, like the way you describe it, is so important to reduce time and effort, but also to ensure we have consistency and lineage and governance. And then maybe the next thing we’ll face into, as everyone else is, once you have great inference, do you need to or can you push that back into the source system effectively and efficiently? That almost is easy, but then how does the source system change? How do we rethink the work steps that that system does to maybe rethink the whole process? I think that’s where the next few years are going to go for all of us.

Chris D’Agostino:
Yeah. It’s exciting. All right. So let’s maybe close out in terms of the sort of technical discussion in the use cases. I was intrigued when we talked last time, you talked about, with classification models. The area under the curve which is a measurement of how well does the model perform in terms of classifying something as a cat or a dog kind of thing, or a cat versus not a cat, I guess is probably a better example. In healthcare, you’re likely trying to classify patients as either having a condition that needs treatment or not having that condition.

Chris D’Agostino:
And I was struck by one of the things that you said, ideally you want to get to as close to a 100% fit under that curve as possible. And so, but you’ve pointed out that even if you’re at 99%, that you need to be thoughtful about the model and its performance because it may not take into account the full timeline and all the data. So you mentioned it earlier in the talk today. Can you give us maybe some example or highlight why a data science team would need to be careful about getting maybe too confident that they’ve got a model that performs well as being the end all be all model?

Sanji Fernando:
Yeah. I think what we’re learning is that there’s a constant dynamic in the utility and use of models today, whether they be deterministic rules or machine learning, more probabilistic, or even other methods. We’re starting to get this idea of inner and outer loop metrics. An inner loop metric might be that AUC, like how well does this inference perform? But an outer loop metric might reflect the total business impact. I’ll use a very simplistic and made up example, but I might be able to predict someone has the risk of a disease really well. And I might use that disease prediction to say, “Hey, that person should get at this type of care”. But if I look at the outer loop metric, or if I ignore the outer loop metric and say, “did they actually get the care”?. Or “What got in the way of their care”?

Sanji Fernando:
Then you get into so much complexity in healthcare. Did they have access to that care? They have the transportation for that care? So many outer loop metrics need to be part of the equation for the data and science team, not simply their inner loop metrics that might reflect the area under the curve. Ultimately that’s how we get to impact. And so understanding the relation between these two sets is going to be use case specific, is going to be a base on the example you have, but we all own the problem. We can’t just sort of be focused on our little world and say, “well, look, my AUC works great”. Did we actually move the needle? Did we actually impact someone’s health? Did we actually change an outcome? We all own that problem. And that’s sort of how we are talking about it and how I talk about it to our teams.

Chris D’Agostino:
Yeah. So it sounds like, tying back to what you said earlier, is really that use case definition, the decomposition, understanding that continuum. My guess is you’re bringing in other stakeholders from within the organization so that you do have that full view of, what would the full patient care kind of look like? How do you predict that the outcomes are going to be accurate taking into account, like you said, maybe transportation and access to the actual maybe testing that they would need in terms of determining if they’ve got that particular disease, or if they’ve got the right treatment plan?

Sanji Fernando:
That’s right. It’d be great to hear what you guys experienced. The conversation about how to make impact with machine learning doesn’t start and stop with the data scientists. We need everyone to be part of the conversation. And I love here non-technical stakeholders are so important for us to achieve the business success. But we work with really dense concepts that it’d interesting to hear how others might face into that, how to understand what an AUC is, if you are not a data scientist.

Chris D’Agostino:
Yeah. I think for us, our objective and the vision for the company is really to provide a data platform where all the different personas can interact with the data in a meaningful way, and to leverage their skills without needing to take non-technical people and make them software engineers. So we’re doing things around low code, no code style model development and data science. We’ve got a BI engine that’s built into the platform. We just were able to break the world record with processing TPC-DS style, enterprise data warehouse metrics.

Chris D’Agostino:
And so we’re really adding a bunch of features that bring in more personas beyond the data science machine and data engineering functions, to more the business analysts and the data analyst. To help him or her understand, of all the data that’s moving through this environment and all the workloads that are running against the data, what are the business outcomes that you can actually visualize and see through the platform? So we think of it as a team sport and our goal is to try and that end to end platform.

Sanji Fernando:
Yeah, that makes sense because I think you hit it around the head. It is a team sport now and if you’re a data scientist thinking about your model optimization, using a metric like AUC, that’s your responsibility, that’s your part of the role on the team. But you are part of the team and we all have to work to get other to hit the business outcome.

Chris D’Agostino:
Sounds good. All right. Sanji, I want to just close out and we always ask leaders this question about, what advice would you give to people aspiring to be in your role? And you’ve got a team that’s doing really meaningful work. What kind of advice would you give people that are trying to create a career in data science and data and chief data officer style pursuits?

Sanji Fernando:
Yeah. I think methods and ideas are constantly coming up through the industry. I’ve been very interested in systems thinking now. A recognition that the complexity in any industry marries, not simply the digital and the software, but what is the market doing? And how does that interact? Healthcare is a great place to think about systems thinking because you have so many different and, not competing, but relationships between patient, provider, payer, the federal government, the state government, the life sciences. And so I’ve been mentoring and when people ask me, they don’t always ask me, but if they do, I encourage our data scientists. I encourage anyone who wants to get further in machine learning and AI to also think about the business systems, the business complexity in the industry. And take that systems thinking mindset that bring everything you learned about machine learning but then understand what’s trying to be accomplished, so you can apply that in ways that no one else has really thought about doing. I think that’s going to be how we get real transformation out of these new methods.