Panel Discussion: Improving Health Outcomes with Data + AI

To drive better outcomes and reduce the cost-of-care, healthcare and life sciences organizations need to deliver the right interventions to the right patients through the right vehicle at the right time. To achieve this, health organizations need to blend and analyze diverse sets of data across large populations, including electronic health records, healthcare claims, SoDH/demographics data, and precision medicine technologies like genomic sequences. Integrating these diverse data sources under a common and reproducible framework is a key challenge healthcare and life sciences companies face in their journey towards powering data driven outcomes. In this session, we explore the opportunities for optimization across the whole healthcare value chain through the unification of data and AI. Attendees will learn best practices for building data driven organizations and hear real-world stories for how advanced analytics is improving patient outcomes.


  • Iyibo Jack, Director Engineering, Milliman MedInsight
  • Arek Kaczmarek, Exec Director, Data Engineering, Providence St. Joseph Health

Speaker: Frank Nothaft


– [Narrator] Hi everyone, thanks for joining us for Healthcare Life Science Breakout. I’m headed over to Frank for our first session today. Take it away Frank.

– Hello everybody, I’m Frank Austin, technical director for Healthcare and Life Sciences at Databricks. And I’m very glad to have this opportunity to talk to you today about some of the work that we’re seeing in the healthcare industry that can really go ahead and leverage data across the whole healthcare ecosystem to go ahead and improve patient outcomes. In my role at Databricks, I’ve been fortunate to lead up our practice across both the product center as well as interactions with customers to help them go ahead and leverage data that comes across many different fashions and forms, whether it’s coming from an EHR, coming from a genome sequencer, coming from an imaging platform or coming through a digital app that a patient is using on their device. When we look at the space today, we look across all the various different segments that we see, we run Hhealthcare and Life Sciences as a combined team here. And we see this just awesome ability to go ahead and share data across all of these different segments that make up our healthcare value chain. So ultimately improve care for the patients who center. When we think about some of these projects that we’ve been able to engage in and just really thrilled to collaborate on, we see a number of examples for how data crosses across here. One of the projects that we’ve been excited to participate in in the past is a project that we did jointly with Human Longevity, who had implemented a process for taking whole genome sequencing plus whole body MRI data, from an Alzheimer’s cohort and include a lot of deeply phenotyped and research consented data that was collected through a government consortium, with this dataset that blended genomic information that gave us the insight as to a patient’s disease risk, looking at some of the genes that are well known to carry up to 25% lifetime disease risk in these patients, they were able to then go ahead and turn to the images and pull out using AI techniques, they were actually able to pull out the size of various different brain organs and use that to go ahead and build a model classifier that would actually predict disease onset eight years before symptoms started to appear. So that’s a really powerful example of data coming from the public sector where it’s been collected to a research consortium and being used to inform a diagnostic test being developed by a precision medicine company. We also see opportunity inside of organizations to go ahead and leverage data across multiple scales. We look at CVS health, CVS health has had this awesome ability to go ahead and use data that shows how patients are consuming prescriptions, how the prescriptions themselves are getting reimbursed through their PBM or through their insurer as well as ultimately how the patient is actually picking up the prescription in the store, through delivery, how that message that the prescription is available is being delivered to them. They were able to take data at the scale of tens of millions of patients, tens of millions of scripts, and use that with the power of machine learning to ultimately come up with personalized outreach methods that would actually drive increased treatment adoption, when they went ahead and they ran this machine learning system across both the interactive data plus the claims data. They were able to actually go ahead and get almost a 2% improvement in treatment adherence and ultimately driving lower cost of care and better outcomes for these patients. And finally, when we think about kind of the long process of getting a new drug into a new drug developed and available out on the market, we’ve been tremendously excited to see projects like the projects that we’ve worked on within collaboration with the Regeneron Genetics Center, where they come out ahead and they’ve partnered with organizations whether it is a large population health initiative like the UK Biobank, whether it is hospital, partnerships with hospitals sites, some of the work they’ve done with Geisinger, they’ve gone ahead, they built out a data infrastructure that is able to take this data, whether it’s coming from an EHR, whether it’s coming from patient survey, the bio-specimens, they’ve then been able to sequence that at, inside the RGC, and then do large-scale statistical studies to identify genes that drive disease progression that will ultimately go ahead and hopefully become the next generation of therapeutics lead to better outcomes and ultimately reducing the impact of disease for many of the patients. So, when we look at these sectors there’s just tons of opportunity to collaborate across all of these sectors, and we’ve been very privileged in our position as a data platform to go ahead and collaborate very broadly across this. In the time since we started our healthcare life sciences team in 2017, we’ve now grown tremendously. And we’re pleased to say that we will work with eight of the top 10 pharmaceutical companies globally, four of the top six health insurers in the U.S., as well as numerous international health care agencies, such as the NHS, CMS and health direct in Australia, and multiple top 10 hospital systems. It’s been really heartening to us to see this adoption of healthcare machine learning and data intensive technologies across the healthcare space, and we’re privileged to do anything that we can to help advance the reach of these technologies. When we look at though how people are making decisions today and what are some of the challenges that they’re facing, we’ve been able to pull out three big pillars that we think are going to drive data strategy for healthcare companies that want to become really data-driven. When we look at first these mounds of data and many healthcare organizations have huge amounts of data, if you look at the average hospital, the average hospital system has about eight petabytes of data. A big challenge is organizing this data, so that teams inside of the organization can use it. In the biomedical research community, we talk a lot about fair principles. So, how do we make data find-able, accessible and interoperable? And we see this great opportunity for the fair principles to be adopted across the whole healthcare ecosystem, whether it’s in the life sciences, whether it’s in the healthcare space, whether it’s by other other organizations that come in to analyze the data. And ultimately we think that with the right application of open source technologies, we can make this data not just fair, but reproducible and interoperable through the power of the open source ecosystem. And we can make it very high quality so that people can use this data very rapidly and ultimately drive improvements. The other big thing that we see is a great opportunity to mine this data at large scale books, to identify new areas that machine learning can improve care but ultimately to go ahead and give us a start down the path to doing machine learning and AI, if some of the things that we’ve been very interested to see are projects where people have taken large population health data sets, mined them at very large scale, and identified healthcare trends that they could develop interventions . Some of the time, this is some of the work that we see on the life sciences side like the projects that we’ve talked about at Regeneron and at Human Longevity, but we’re seeing a very large adoption of the data-driven way to drive health outcomes in the public health space. We were very excited to work in the past with CMS, on a project where they mined very large scale data, state level claims data, and they were ultimately able to identify a variety of different factors that would lead to poor maternal and fetal health outcomes. And with the power of data, they were able to start thinking about, how do we design interventions to make these outcomes be better? What else should we be aware of in our data? But we’re also really excited for is the power of open source technologies to make these workflows easier for people to go ahead and adopt in a context where there’s a human in the loop. One of the things that we found is really interesting, is a lot you know, in many aspects of the data intensive ecosystem outside of healthcare, prediction will be served up without a human in the loop. You know, an ad will be bought, a webpage will be served, and AB tests will be run. But in healthcare, oftentimes there is a human in the loop, this may be a claims adjudicator, it may be a clinician. It may be a hospital administrator who has to decide where they’re going to reallocate staff to. And we think that with the power of open source technologies, we can actually go ahead and center the human in the loop and give them more power over the decision that is ultimately made. It’s not an AI algorithm making a decision, but it’s an AI algorithm that is serving up a decision and serving up information about the context and made that recommendation to a human who can then go ahead and use their professional judgment to go ahead and apply that. So we see a lot of power in this space, and when our team comes in to address these problems, we’ve kind of lined up three big pillars that we work on. First, our team is heavily invested in building out infrastructure that makes it easier for healthcare organizations and the life sciences organizations to pool all of their data together, and to curated. Some of the places that we focused a lot of energy on is on the data ingestion side, whether it’s through projects like project glow or project smolder that help our customers take in large datasets, in the genomic space with project low and in the electronic health record space with project smolder, we see there’s a great opportunity to build on top of the open source API, so that the healthcare community has built out, but to really make that data that comes in those open source APIs, able to be read into high performance story systems like Delta Lake, so that people can then go ahead and access them very, very rapidly, and scalably from there on. What we then focused on, is taking the algorithms that people are working on today and making them easier to run at very large scale. One of the things that we’ve been excited again, talking about project low, and project low is a collaboration that we’ve done in between our team at Databricks and the bioinformatics and genetics organizations at the Regeneron genetic center. One of the things that we’re really excited to announce this summer, was the introduction of glow gr, which was a method that took a lot of these queries that we were running to find a correlation between a genetic data point and a patient’s disease and people wants to run this at the scale of hundreds of thousands of millions of patients with the glow gr method, we were able to take that query that people want to run and make the one to two orders of magnitude faster to run. We see this as an opportunity, not just in genomics where we’ve been pleased to bring this out, but we also see the opportunity to do this in a lot of other contexts, whether it’s biostatistics, whether it’s working on medical IOT devices, and we look forward to introducing more technologies in this space that make this analysis re really scalable. And finally, in the machine learning space, we see this great translation that’s gonna happen over the next few years. As people take these ML algorithms that they’re feeling more and more , and push them into regulated applications. When we actually look back to that workload that we spoke about earlier at Human Longevity that happened in a regulated setting, and they went through great pains to go ahead and identify a very good quality control and validation workflow that would work for that sort of a model. We think that over the next few years, the adoption of ML in regulated settings is going to grow increasingly, whether it’s in a clinical setting, whether it’s in a med device setting. And we think that through the power of MLflow and some of the best practice workflows that we’ve developed it will be much more easy for organizations that are running clinical labs, that are building med devices, that are bringing new therapeutics to market, to use machine learning in these critical regulated areas. Ultimately, when this comes together, what we’ve been seeing is the adoption of this kind of end to end this end-to-end process that is built on top of Delta Lake, and other open source technologies. What we’ve ultimately seen both through the easy to use conduits like project low and smolder, we’re able to quickly go ahead and bring data in these complex biomedical data formats into Delta Lake, stored in a system that is easy to manage has a low cost of operation, and that can scale out to truly petabyte scale data sets, that people can query interactively. Once that data is staged in there, they can go ahead and process the data at very large scale and train machine learning models that they can store in the MLflow, open source toolkit, which gives them the ability to go ahead and introspect into these datasets, understand how these algorithms are fairing, share these algorithms across their team, and ultimately collaborate to go ahead build out a, you know, kind of build out these models, share these models with other organizations, and really feel confident in their results. And finally, through the integration work that we’re doing with popular, both popular BI tools, like Tableau or Power BI, as well as through the recent , we think that there’s gonna be this great ability over the next few years for people to really serve up the recommendations that they’re making to that human in the loop, in a way that they can really introspect, into the recommendation, they can analyze the data that led to it, and they can get very comfortable with deploying these models in a way that people who are on the front line, the clinician, the claims adjudicator the admin, are really able to engage with that data and feel like stakeholders in the process. So ultimately we think that over the next few years, this will really drive kind of a new look to multi-scale medicine, where we’re able to do kind of operational automation that reduces the cost of running healthcare, reduces the cost of manufacturing, therapeutics and devices, ultimately drives new healthcare efficiency while maintaining a standard of care for patients, or hopefully actually increasing their standard of care. We see a lot of opportunity for population and precision health, where we may be learning from weak signals at an individual level that when multiplied across hundreds of thousands of individuals up to millions of tens of millions of patients become unmistakable and very hard to miss. We think this will have a big impact when we look at diseases that go underdiagnosed or misdiagnosed today, especially in the rare disease space. We think this will have a big impact in helping us better model the disease risk of an individual so that they can take steps in a preventative care factor in a preventative care setting to go ahead and address that disease before it becomes a more serious condition, and we see a great opportunity to use this to drive the design and development of new therapeutics in the biomedical space. And finally, one of the things that I think we’ve really seen over the last six months and I think this will be a real sea change in the healthcare space, is we’re gonna see the ultimate very new and exciting ways for patients to engage with their care. We’ve seen just over the last six months the dramatic increase in prevalence of telemedicine, and in digital, and remote patient monitoring. And we think that ML technologies like the technologies being deployed at CVS, being deployed at organizations like Lavango today, will ultimately give people new ways to interact with their care, give people new evidence that informs the choices that they as an individual are making on a day by day case and will ultimately drive greatly improved health outcomes across all of systems. So for us, one of the things that we were trying to do to enable this, is the development of solutions accelerators. We have a number of these that are available today, and what a solution accelerator does is it couples of material that gives insight into how a problem works, how a problem is set up, and it kind of dives into a notebook that provides a best practice example of how to go ahead and implement that work load at scale, for instance looking at our pathology image analysis workflow, and this was a workflow that we developed working with a customer who had digital pathology slides from cancer samples, and they were trying to go ahead and do machine learning on top of that, to identify on that image of a cell, on an image of many cells, which were the cells that were cancerous, they wanted to go ahead and get that segment of an image so that they could compute some metrics about the patient’s cancer that would help inform some of the things that we’re doing on the genome sequencing side, help inform some of the recommendations were serving back to patients. When we go through the digital pathology accelerator, we kind of walk through, step-by-step the problem what’s the outcome you’re trying to achieve, how you load the data in, how you use open source machine learning tools to go ahead and build that. And then what actually looks like to store that data in MLflow, so that you can use that the next time you get a pathology image to use that model you’ve trained to score that model itself or to score that new image itself. With this, we see a great opportunity to take these best practice workflows that are being built out across the health care space and publish them in an easy find-able and searchable way, so that new customers can come on, start to use these solution accelerators and identify an end to end solution much more rapidly than they made today. So we’re very excited to be bringing these out, and we look forward to adding to adding to the collection that we have over the next few years, and really building out an open source healthcare ecosystem that we can all participate in and get value from. So thank you very much for your time today, we look forward to collaborating with many of you and we’re very excited to play any role that we can in this massive data-driven transformation that’s going on across all of healthcare and the life sciences today. And we look forward to seeing the adoption of these open source tools and technologies like machine learning to drive better outcomes for patients and reduce cost of care across the whole healthcare ecosystem. Thank you very much.

– [Narrator] Thanks Frank, that was great. Now I’d like to turn it over to Steve Brunner, head of AI platforms at Humana, to give us his take on how they’re scaling data science, Steve.

– Hi, this is Steve Brunner, I’m gonna talk today about enabling ubiquitous AI in flight. So I’m the director of AI engineering at Humana, and I’m very happy to be here to introduce you to what we’re doing with ubiquitous data. A couple of the things I wanna talk about, I’m gonna tell you who Humana is, I’m gonna talk a little bit about the digital health and analytics team that I’m part of then I’m gonna define ubiquitous data and talk a little bit about the machine learning platform that we’ve implemented, and Databricks how it figures into that. And then talk a little bit about what exactly we’re doing and where we’re going. Humana, Humana is a very big company, the biggest one I’ve ever worked for. It’s a fortune 52 company, we’ve got about 65 billion in consolidated revenue in 2019. One of the things that attracted me to Humana, and I’ve been here about not quite a year yet, so I’m still new, but our mission is big. We are committed to helping our members achieve their best health, and I’ll talk a little bit more about how we are going to do that using data. So Humana has bold goals, one of the things that we have done is, one of them is a bold goal to… that’s a population health strategy to help the communities we serve be 20% healthier by 2020, and you might say, well, gee, how do you measure 20% healthier? Anybody can claim that but we’re gonna actually, we actually are measuring it using measures that are well-defined in the industry. And if you want specifics on that, look at our website, but it’s a you know, we would define that as the healthy days, as defined by our members each month. So we do surveys, and this is how we gather that data, and we let them define how a healthy day is. And I’m gonna talk a little bit more later about how we actually do tests to make sure that the things, the programs we put in place are actually achieving their purpose. Now challenges in 2020 boy, this is the first year, I think in the last century that has had four super disruptors. And we define those as recession pandemic, mass protests and that intense election, and there’s only been three other years in the last 100 that have had three out of those, I’m not gonna tell you, I’ll let you do that as an exercise to figure out what those were actually in those particular years but big challenges need big responses, and this is definitely a year for big challenges. So just as an example, Humana has sent nearly 1 million meals to members at risk that we’ve noticed, and we have identified some of those through predictive models but we also have people on the phone who just pay attention, and when they hear certain things for our members, they take action and we try to get results, and one of the things we’re trying to do is get people food. Now I’m a member of the digital health and analytics team, so it’s team that was established about two years ago under Heather Cox. So it’s really fairly new, Humana has been around since 1960, I believe. So, and we’ve gone through lots of transformations, this is our latest one. Digital health and analytics is designed to try and take advantage, and use our thousands of data assets as an advantage for what all of those bold goals that we have set for ourselves. Now, two years ago, the very first estimate that we put together had 1.9 million data assets. So as you probably have figured out, we don’t really have that many, there’s just lots of redundancy. And that’s part of what we’re about in digital health and analytics, we’re trying to centralize the management of that data. So we as a team serve the data scientists, and the advanced analytics needs for across Humana, and we define ubiquitous data in flight, and some of the things that that ubiquitous data in flight needs are Data Governance, DevOps, we need CI/CD pipelines, and we need elastic computing and storage. So very much leading edge into the cloud is what a lot of what we’re about. So let me talk a little bit more about the ubiquitous data that we have in flight. We have the first kind of data that you might be very familiar with this structured data, this is data and databases, claims data, members data, demographic data, we get financial data. Now, there is also unstructured data, we’ve got faxes, images, text fields in medical records, audio recordings, and video recordings. We wanna take advantage of those as well, so, by turning those into usable forms of data is, you know, requires AI services, and that’s part of what my team does. We do speech to text, we do natural language processing, we do OCR optical character recognition, and we do an image analysis as well. So we’re trying to extract value from this data by predictive modeling, machine learning, natural language processing and cognitive AI services. So I wanna announce, I should say, we just announced last week, Florence AI, Florence AI is the name of our machine learning platform at Humana. The main Florence AI comes from the city in Italy, named Florence. This was inspired by… the name was inspired by the dome, and that was designed by the arc famous architect, Filippo Brunelleschi, and he designed the dome of Santa Maria del Fiore. And the reason that Florence, we picked Florence Italy is, because it’s the first birthplace of the Renaissance, that ushered in a period of amazing discovery and innovation, and we feel the same way about our machine learning platform. We’re ushering in for ubiquitous data, a cloud-based repeatable pipelines leading edge tools and algorithms, and we really think that this is gonna provide a Renaissance in data analytics at Humana. You might say, why do we need a machine learning platform? One of the things, you might say, “Well, T’s there’s hundreds of tools out there already for doing machine learning.” And that’s the good news, and that’s the bad news, because at Humana, we’ve got hundreds of tools that we’re using for doing machine learning, and advanced analytics. So what we’re trying to do is eliminate the silos and we’re trying to bring a horizontal platform for use by the data scientists all across Humana. So, and we’re trying to use leading edge, and modern practices, software practices for this platform. So it’s a cloud platform and it is, you know, we’re going to be automating and accelerating as much as possible. And we’re using this at doing this at scale in Azure. This is a platform that has been built by data scientists, for data scientists. Now, some of the key components are a feature store, we have models that we build, train, monitor, and deploy, and we have starter code frameworks to really pave the data scientists path to production there, it supports, and it’s built all around Python and Spark, Databricks and MLflow. We’ve got templates that we’ve developed and lots of training materials to help accelerate the data scientists across Humana. There’s on the order of somewhere between two and 300, now, those are not in my team, I’m providing the platform to enable them to be successful. So our end to end ecosystem, really promotes open source innovation and enables the development, and deployment of machine learning. We are transforming the workflows that people do, and it we’re providing a single interface to simplify the consumption of model outputs. Again, we’re trying to provide a center of excellence, we’re trying to be the hub of the Renaissance here at Humana. So we’re trying to help our members achieve their best health, and make their experiences simple and easy. This is really our Humana larger goal. And at the heart of this center of excellence is Databricks, which can handle the very large data sets and the complex operations that we need. Let me talk a little bit about a feature store. So feature store in case you’re not familiar, data that has been aggregated in a way that machine learning models can understand and use. So for example, how many times a member has been admitted to the ER in the last six months? And what’s the average cost per visit? So, without this feature store, data scientists can spend weeks to create and compute these specific things. Now our current feature store, and it’s still growing has over 15,000 features that have been a little snippets, I think of them as little snippets of code based on bringing together the data assets that we have from 35 different categories across, all of those different kinds of data that we’re marrying together. And we’re making those available in the feature store for data scientists to develop predictive models. Now, so, as I mentioned Florence AI is groundbreaking because we’ve got a shared feature store that data scientists can use. We’ve got Databricks notebooks that are assembled to complex pipelines. These are reusable, they’re monitored, audited, and validated. So we’re really trying to make as much of this automated and put guard rails around these data pipelines. But this is an open ecosystem, I’ve been in the software industry for a long time and it is evolved quite a bit since I got started, and I’m not gonna tell you when that was, but it was a very long time ago. And I’ve come from a software development world where we’ve built monoliths and proprietary software products, and tried to do everything in house, and I really like the current sharing that is going on in the industry, data scientists, you know they can develop algorithms and they post them, lot of open source software, and you might think how can people make money and be successful with open source? Well, just look at Red Hat, very successful company based on a free operating system. Okay, one of the other things about the ecosystem is that it provides, and to our environment is ’cause it’s cloud-based, we have elasticity. Our data scientists and our work groups don’t have to compete for compute and storage, they can each start and run clusters that are suitably sized to accomplish very big problems in many cases. So, why again, I talked about open source know, if you look at some of these things, we’ve got MLflow developed by Databricks, we’ve got TensorFlow, CORE NLP, Spark, Apache Spark is of course at the center of many of the data manipulation algorithms and tools that we’ve got, and of course, Prophet and scikit-learn, and this is just the beginning. One of the examples that we’re using very effectively right now is called Bert, B-E-R-T, it’s a really game-changing NLP model that greatly improves the power of NLP systems. This was released by Google and we’re using it very effectively to get information from our documents. I wanna talk a little bit about the power of predictive modeling. So COVID-19, some of you may have heard of it, so our data scientists in the last few months have been spending long hours with the data to inform how Humana can best help our members. What is one of the questions that they might ask is what is the effect that social distancing has on COVID-19 infections? Well, let’s look at a couple of visualizations here, and I know these screens are very busy, but I want you to focus on a couple of things. Look at the greenish area under the curve, in the middle of the screen, this represents infected patients. The other number here that I want you to look at is, and it’s a little bit hard to read, but it represents the rate of social distancing in place. And these are guidelines presented by, based on the guidelines for this particular region, and this is fairly recent. So if you notice here, you probably can’t read it but it’s 72% in this diagram. And we were able to do a model that shows how it changes, and how the area under that curve changes as we change and move up, and down in the amount of social distancing. So I wanna show you the next slide, and if you notice, going from 72% to 75%, a very small change 3%, can have a dramatic effect on the area under the curve which represents this is really active infections. I’ll go back, I’ll go forward, look at that. It is pretty impressive, what we can… the power that we get from some of these tools. Now, one thing I wanna talk about is our predictive models, they identify the members that are most at risk, say of avoidable admissions is one case. So, but to use that information, we need to take action, so somebody has an idea, how can we take action and improve that outcome? One example would be, we could send a nurse to knock on that door, or there’s other ideas that people have, but how do we know that that’s actually gonna help? You might ask? We don’t. So we do tests, right? It’s not just some Vp’s good idea, we actually designed tests and these are not software tests, these are tests in the realm of experimental tests, so if you’re familiar with drugs and they go through clinical trials with control groups that process is the same process that we follow, when we talk about the tests that we design. So I’ve got a team that I work with, one of the leaders of that team is somebody who’s worked with some famous behavioral economists. If any of you are familiar with Steven Levitt and who’s written the book, he’s a coauthor of the book called “Freakonomics,” right? So what I’m trying to say here is, people don’t make rational decisions always, we have all these biases, cognitive biases. It’s a very interesting, kind of a very interesting field, if you wanna read more about it, read up on Danny Kahneman and “Thinking, Fast and Slow,” famous book. And he and Amos Tversky, have really revolutionized that industry. So I’ve got people here who are behavioral economists, who help us design tests, and the reason it’s important to design the tests to get the right experimental results is because we wanna make sure that we’re not playing into cognitive biases, and that we’re going to be designing tests that are statistically sound. So we’re developing a test and learn platform that will enable us to get the test results and be able to prove that the interventions that are informed by our predictive models will actually have significant effect on the results. Sometime, I will say that not only tests are quote successful, some of these tests actually fail and show that the idea was not good. So we’re trying to build a beta driven organization with feedback loops, this is what Humana is about. I wanna summarize by saying that Florence AI is our machine learning platform that we just introduced, it’s been in the works for about a year, it enables Humana data scientists to develop high quality predictive models from all the ubiquitous data that we have in flight. And this allows Humana to achieve our business goals, our bold goals to help our members, achieve their best health, and make their experiences with Humana simple and easy. Thank you very much.

– [Narrator] Thanks Steve. Now back over to Frank for industry leadership panel, you don’t wanna miss this one, Frank, over to you.

– Excellent, thank you all very much for joining us today. I’m Frank notes have technical director for Healthcare and Life Sciences at Databricks, and we look forward to today’s panel where we’ll be talking to a number of industry leaders from the Healthcare and Life Sciences industry about current trends in the space. I’m pleased to be joined by three panelists today, Joe rumor, who is the senior director of global commercial and an analysts insight and analytics at AstraZeneca, Iyibo Jack, the director of engineering at Milliman Medinsight, and Arek Kaczmarek the executive director for data engineering and operations at Providence St. Joseph Health to go ahead and get our panel kicked off today. I’d like to start by talking about, obviously COVID has had a traumatic impact on healthcare and pharmaceutical research, and I’d like to rewind back to January of this year, we’ll start by walking through with our panelists, please tell us a bit about your role and what you’re doing with data and AI in your organization, and I’d love to hear a bit about what you’ve been focused on and what you learned during this whole process. So maybe Arek, would you mind kicking us off with a little bit of your experiences at Providence St. Joseph?

– Sure, so at St. Joseph Health, Providence St. Joseph Health, going back to January, what were we doing? We’re doing a lot of reporting, a lot of just regular BI, and then when the pandemic came upon us, we started, we had to spread, we had to first stop everything that we were doing and then spread across more data sets, and look at our platform from a more comprehensive perspective you know, how do we combine different data sets? How do we get other data sets that we didn’t have in our platform? So I think that was kind of the, you know how we switched from doing more traditional BI and data sets to really combining different data sets together and looking more comprehensively at some of the things that, we were doing.

– Now, that makes a lot of sense, and I would actually imagine Iyibo being in the space that you are a Milliman, you’ve probably had kind of a similar experience, you all work with many diverse data sets. What was it like for you?

– Yeah, so over to Milliman insight in January, well, we thought it was gonna be a normal year, so we have a our typical roadmap and our typical strategies and key things that we wanted to get done. And being from Seattle, the pandemic hit us pretty early in the U.S., and we really had asked ourselves some questions that we hadn’t asked before, and the number one question we posed was like timeliness. It became okay to get an answer to a question three months ago, but when you’re in the middle of a pandemic, people wanted answers now, and it really exposed the gaps we have in our data engineering pipelines, in our processes, in our validation, that we kind of resting on our laurels and from our space, we’re kind of at the month grain, six weeks, customers wanna look back at the claims history, and now they’re really on it. And they really want to know, “Hey, if I give you encounters that haven’t been fully adjudicated, like, will you look at them?” And, so we really had to take a step back and basically do the things that we thought we did well, better. We needed to be very, very clean on our basic core mandates, and we had to start looking across multiple disparate customer contributions of data to look at trends that historically we normally wouldn’t need to. And that’s kind of what really shifted us in this post-COVID world or during COVID, we’re looking for these kind of these macro trends of drug use, diagnosis, DRGs, and just like, what is happening out there? What are people seeing?

– Now that makes a ton of sense, and I think actually that kind of winds up being an interesting segue over to Joe, you know, obviously being on the commercial side of pharma during the COVID pandemic everybody, when they think about pharma, they’ve been talking about the race to get new treatments out, new vaccines out, there’s been this enormous focus on the R and D side of pharma, but how has it been like on the commercial side? You know, I imagine kind of similar to Iyibo, you’re seeing a lot of changes in drug usage, a lot of changes there. How did this shift the world for you?

– Iyibo story resonates really well with me, I mean, we definitely focus post the research and development side, and I know there’s a lot of emphasis on vaccine and vaccine development, but really our work starts, the moment a product is approved in the market, and the commercial sales and marketing. And then the second, the pandemic hit which you had is a whole giant field Salesforce that usually went into doctor’s offices, visited doctors talked about our products, try to answer their questions. They’re grounded, they were not going to doctor’s offices, they weren’t having those conversations. The recency that Iyibo mentioned changed dramatically, like we wanted to know daily where they’re still offices open, did the dynamics around the pandemic change, could we have those conversations? But I think more importantly, we quickly pivoted to other digital tactics that data that maybe was of secondary relevance before all of a sudden became widely relevant, we want to know some webcasts for hosting, how do we optimize? How do we get the messages out? How do we talk about pandemic? So we get data that I think we weren’t really prepared to deal with being front and center website traffic, digital data, it was just a very different world very quickly, and having people remote engage much like a video conference, the metrics you capture out of that are very different, and so we pivoted quickly, and I think Iyibo story resonates well with me. and we had our engineering teams on the background trying to figure out and scramble to get some answers to people quickly, on data that we weren’t always used to being front and center.

– Well, it’s very true, I think it’s been very interesting to watch from my side, the pandemic has won I think, you know, it’s forced us in healthcare and life sciences to really grapple with like what the digital transformation for healthcare and pharmaceuticals will look like. And it’s also forced us to really reckon with questions around healthcare and interoperability, obviously we’ve all been talking about healthcare interoperability, probably extensively ad nauseum for the last decade or more. But, I think during this last few months, we’ve discovered a lot of those places where we were dealing with disparate data sets, as well as we thought, or we could do interoperability, but only if we could do it slowly. I’m kind of curious, I think, you know, Iyibo, you had a great point there and Arek, you did as well. What were some of the lessons learned on your sides and what do you think is driving kind of the struggles, and how can we as a field be doing better? Like what what’s the low hanging fruit? Maybe, Iyibo, if you wouldn’t mind giving your perspective for us, that’d be great.

– Yeah, so on the interoperability, so many said we do, again, we do a lot of monthly batch processing of customer data and we get it primarily as text files, CSVs, and that works and that’s fine, and I think that’s great. What it turned out is the second you needed to start consuming additional types of data, it became clear that the speed that a CSV file has for loading purposes just doesn’t provide enough flexibility and metadata to do anything set with second level of complication. And this kinda hit us, and we’re like man, we’ve kind of built this architecture and it works, but it works assuming that the customers don’t want to change things rapidly and see the outcomes of these data pipelines daily. Like we were still in that paradigm of like, let’s do this once a month and pat ourselves, pat ourselves on the back. And so that requires us to scramble and say like, okay, well, what do you know, would it be consumed native parquet files or sort of JSON files? Or like, what do we… like, how are we gonna be able to provide that flexibility so that we can bring in this data, we can refine it and provide information, information that people desperately needed to answer questions because businesses on the payers were wondering what their risks are, and providers are also saying like, I’m not getting patients, ’cause during the shutdown all elective procedures, and what Joe mentioned, like you know, facilities were closed that weren’t critical. And so I think the drive in our industry of more standardization and leveraging data formats that allow for easier consumption is becoming, I think that’s definitely evident now, like performance is important but it doesn’t matter if my file type performs, if I can’t actually read it in and get value from it. And so that’s what I have noticed.

– Yeah, that makes a lot of sense, and I think that’s been an interesting thing for me especially in some of the work that we do in other parts of the healthcare space like, you start talking about EHR, which I know Arek’s group does a ton of work with, you have kind of competing standards, whether it’s the older HL7 APIs or the FHIR API’s. And there’s a lot of discussion as to like, what does interoperability really work by it? You know, FHIR looks like an API that if you’re gonna do that on a web app, that’s really good for, but it’s not always clear, it’s the best one for analytics. I guess Arek, I’m on your side, what has that experience been like, especially working with EHR data, working with a bunch of other disparate feeds, what has been the lessons learned there?

– is interesting because everything that my co-panelists mentions definitely resonates within our area as well, our world. So the first thing that we learned is that we need to provide the data much faster than, we were doing that before, so I said that, we’re doing a little bit more of the traditional BI but now, we needed to identify, for example, one of the things that we wanted to do or our clinical groups wanted to do is to start identifying patients, for example from the different diagnoses and things like that. And in order to do that we need it to provide that data much faster. One another thing that became apparent is that we need to start work working in ingesting data, to be more interoperable from our EHR and HL7. And we were doing that to some extent in the past, but now we had to speed that up, you know, speed up the development of the parsing, you know, pulling in new libraries to make sure that we can analyze the data properly. And so there was a lot of new developments and new things that we have to try out to make sure that we provide the data in a timely fashion, and the different data sets, as I mentioned before, like ADT or that types of data from the HR.

– Yeah, you know it’s funny, we tend to think of EHR as at least from the outside as relatively clean standardized interfaces but they have so many different interfaces below them, so many different feeds inside of them, it’s been fascinating, it seems like an area where there’s a lot of ripe fruit for greater, kind of greater standardization, better open source packaging. We’ve been dabbling a bit on some of the open source packaging our sites, make it easier to work with HL7 and FHIR, and I think that that’s gonna be a great wrap forward. Joe, I think I was actually really interested on your side when you started to talk a lot more about the digital data that you’re working with, because whether it’s on the pharmaceutical side, whether it’s on the healthcare side, we’ve seen this huge pickup in this, we had a lot of people in the past have been saying, oh, this is gonna be a big thing, we need to use more digital data in terms of web traffic, understanding how patients are engaging with doctors, understanding what they’re seeing in their patient portal. But a lot of the efforts haven’t been as usually emphasized before now, did you want it to a lot of interoperability challenges when working with that? Or what were some of the challenges that you face there, because this is a relatively newer data stream for a lot of us in the healthcare space.

– Yeah, I think for us, I don’t know that I would necessarily call them interoperability challenges though, some of the stuff that Arek mentioned, we’re certainly keenly interested in when we wanna partner with a health system and help either guide patient identification, and see were really where our product is gonna benefit, we love that interoperability exists so we can partner closer. On digital, I think our challenges were mostly around the identity and seeing when we could figure out, do we have a meaningful thing, and a little bit of marketing discipline. There’s a moment when you have a registration that you want people to register and there’s a choice. Do you have people register or not register? And then it can create a very different experience for them, if you know who they are and you know their preferences already, and you can serve up the right content, or their anonymous and you let them be anonymous and that debate around dropping off potentially people with the registration wall, or letting them in, and having a very different experience, one that’s more personalized or not personalized was a fascinating thing to sort of happen real time that I don’t know that we were quite as prepared for, as I would have expected, and I think it’s something a lot of people will continue to struggle with is digital grows, the data and the velocity of the data was certainly a big challenge for us, and how quickly it came, and when you’re talking about the types of questions you’d ask about video, how far in do they get, when did they drop off, what content was being displayed when they dropped off? When did they stay? Like there they’re different questions then how many prescriptions went out the door? Where were the prescriptions at? What’s the managed markets formulary status? So I think it just challenged us in a lot of different ways.

– Yeah, it makes a lot of sense, and I think one of the interesting things that you’re kind of picking up on there is, you know there’s this great opportunity to use digital techniques to really further patient education, make a lot of this information more accessible, steer people towards content that helps them take better understanding of their care, better understanding of the condition that they’re diagnosed with. But there always comes up with this challenge of helping them understand why this content was suggested to them. Why did our algorithm say that we should explain this to them? You know, what is in it for them? This isn’t just advertising, we’re trying to help them take care of themselves better and work with their condition better. When you’re thinking about that, I know you’ve thought a lot about transparency and how we help people really understand that a bit better on their side. How did you think about the patient experience in this and being transparent with them? You kind of talk about that, sometimes you set up a registration wall, sometimes we let everyone in, how did that ultimately play into your data strategy in the way that you design these ML based apps?

– Yeah, I’m sure like a lot of companies, and I’m gonna guess that this is gonna be a common theme we’re doing a lot of conversations about AI ethics, about ML ethics, and how we make sure that our intent is always as positive as possible. We have much like every company a set of core values that we believe in and we believe in firmly, and it’s very much about patient health and advocating for the patient, and making more meaningful differences their lives, and we’re trying hard to keep that at the center and having active conversations about potential bias and models. Does the data represent the right sort of information that should be presented our models accurate? Are they drifting? Are we keeping an eye on them, and making sure they’re doing the best that we can? We’re still very much in a human in the middle of most of our models too. I don’t think we’re at the place where it made the leap of, the model that’s the whole answer, and that’s the right answer because we firmly believe we don’t have the full picture but I think the conversations about AI and data ethics and bias and data are gonna continue, and we’re certainly trying to tackle them with our principles values as an organization at the forefront of it.

– Yeah, that makes a lot of sense, and I think the good news in a lot of healthcare is that there are very natural human intervention points, there are very few things that get served up to… very few decisions that get kind of made by a model that impact a patient without someone in the middle. But I guess, I know Iyibo, we we’ve talked about this a bit before, especially around bias in models, especially around representation and looking at model drift, you all sit in a very interesting position because you’re working with this very, very vast amount of population health data. How have you all been handling it, and how have you all been addressing it? You sit in this fascinating space in the healthcare ecosystem, that I think gives you a slightly different perspective on it, I’d love to hear a bit more about that.

– Yeah, so within Milliman we have some of the best data scientists and machine learning expertise, I think in the country and or the world, and when we’re putting together like a model, I think you’ve gotta be very very careful and understanding that, you can’t rule out bias in a model because a human creative model. Now we are hopeful that will the data we use to train the model isn’t biased but that in of itself is at the mercy of he or she who collected that data to feed the model. And so the way I see it, the only way you can really adjust for the inherent biases of Iyibo Jack, creating this model and finding data to train my model on, is that what Joe mentioned is that human intervention by well, I need somebody else, who comes from a different perspective who views this problem slightly differently than I do, we need to have a debate about this model, A, and B, we also need to really ask ourselves what is the impact of the information, the answer, the score, the rating that this model produces, and who is it actually affecting at the end? And I think those are stuff that you have to think about, and when you do you find yourselves, you find some of these really awesome macro models and AI that you wanna do, and you realize, yeah, I can’t really do this in good confidence and really understand and believe that the results I’m having are gonna have a positive impact for people. But you can have models that are very specific for very specific use cases that might not wow people but there are beneficial, and that’s kind of the path that we’ve gone. We’ve had preliminary conversations about what it means to have AI ethics, but if you just think about it for a second, you realize that’s a lot, like you don’t really want this machine just saying, claim approved, claim denied, and just have it be this model, ’cause that affects people’s lives, right? So, there’s a lot more work that we have to do, and a lot more respect, and we’re also, and due diligence that we have to do, and we’re interested in looking at what the rest of the industry is doing, what Arek and with Joe are doing, and competitors of all of ours are doing in the industry to make sure that at the end of the day, we’re all progressing towards the same goal. We wanna improve healthcare for everybody, that’s what we’re about. And we’re all doing it in different ways, but ultimately that’s our goal.

– Yeah, that makes a lot of sense, and I think it winds up being kind of an interesting point as well. You know, really a core part in this is understanding metrics and understanding how an intervention that you’re designing, no matter where it is, what are the metrics that you expect to impact, and what are the possible things that could like, what are the unintended consequences? I was just talking through this with a colleague this morning, who had this great point that, there’s a lot of problems in healthcare that we think about where to us it seems like, oh, well, we’re gonna make a model that makes a suggestion to someone about their course of care, and they will make a positive suggestion. Like we’ll suggest, oh, we’ll follow up with the patient, we’ll give them a phone call to say “Hey, are you taking your medicines? We wanna make sure you haven’t you’re not… we wanna make sure you’re following your treatment so that you don’t wind up getting readmitted to the hospital.” And we assume that they’re a rational actor and that they’ll say, “Oh, thank you for giving me the phone call and they’ll stay on that.” But there’s a lot of places that we can fall into these unintended consequences where things go differently. I’m kind of curious to know, have you seen anything like that, where you’ve seen models that impact metrics or impact results in an unintended way, or thoughts on your side?

– Well, here’s what I will say, I will say that when you’re building this model and you have an idea at the end of the day, you’re building it from your perspective. So you said, hey, rational actors, so let’s talk about that for a second. A rational actor is an actor that behaves in the matter that I deem rational, but what makes my definition of rational even more valid or invalid than Arek’s or Joe’s. And so, while you taking your medicine, well, this population populations of people in America that have had terrible experiences with physicians, they don’t trust physicians. And you’re like, oh, I don’t think most of us put that in our model, you know? Oh, they don’t really believe in physicians, they don’t trust them. We’ve had this certain percentage of the populations of people in our country that are basically have not been received proper treatment, they get less pay medications, they get questions, they get bounced around, because they assume they don’t have insurance. These are all information that doesn’t get captured if that isn’t in your world. If that’s not an experience that you know anything about, how can you capture that in your model, and how would you even look for data to train for those weird edge cases? ‘Cause you don’t have that, data claims isn’t going to show it, the EMR is not gonna show it, right. So that’s kind of where I’m like, we have to be very, very honest with ourselves that we have a tremendous amount of computing. We have great vendors and partners like Databricks that can do tremendous things for us. But the second we project our definition of reality on a population, you will find out that you are very, very wrong.

– And you know I think that’s a really great point, we have it, you know especially for us, we’ve done a lot of work in human genetics with some of the customers that we work with. Human genetics has a massive bias problem. It often selects from certain groups, whether we’re looking at income status, whether we’re looking at geography, whether we’re looking at ethnicity, and this leads to enormous problems in using that data to come up with better treatments that really work for everyone. And especially, when we talk about healthcare systems, all healthcare systems face tremendous amounts of bias to their patient population, to where they are regionally, and things that can make it very challenging to go ahead and take insights from elsewhere, and interpose them in. I guess I’m actually kind of curious Arek, at Providence, have you all had kind of like similar conversations, I’d love to hear kind of the perspective on that side.

– Yeah, you know from our perspective, it’s more about the ethics of how we use the data at the moment, at least what’s something that Joe alluded to earlier, and it’s, how do we protect the data with all the increased volumes, now that we need more data sets, more data overall to train our models properly and do it right. How do we make sure that we have the right computing available to us, and that that computing is secure and the data is protected. So, really our, I think I would say that our efforts are, our efforts at the moment are going in that direction. That is not to say that we don’t think about skew and bias but we’re kind of fundamentally, we wanna make sure that we protect the data first and then run the models, second.

– It makes sense, it makes sense. Data governance, I think winds up being a big challenge across the entire healthcare spectrum. Have you seen any kind of best practices? You know I always think of it is, we have this kind of challenging this challenging balance to strive for where we need to keep, you know we need to have a very strong data governance program both legally to stay in compliance with HIPAA, but also just to maintain the trust of the patients that we’re treating, but we also have this competing goal of making sure that the data that we’re using or the data that we have can be used by researchers, data scientists, and anyone else to improve the state of care. Have you all had any best practices that you’ve developed?

– So I can maybe say that there’s a definite tension, I think I will amplify what you’re saying, that there’s a tension between these two, we want to provide the best healthcare that we can, but at the same time, how do we do it ethically and make sure that we do with, in accordance with legally and in compliance with HIPAA, and everything else. I don’t know that I can share any best practices necessarily here, that would-

– No worries, no worries involved.

– Yeah.

– Yeah, I think it’s a very, it’s a challenge that many of us are evolving our strategy around, especially is as time goes on. I guess, Iyibo curious, I had in the med insight team, how have you all been balancing that, because you all, you have a lot of large population health data sets you’ve collected it internally, you work with a number of external organizations. So does that create additional challenges or does that kind of change your perspective on it a bit?

– Yeah, so security and ownership, and access to the data is, it’s a core tenant of what we have to build into our platform and into our data pipelines. And I think a challenge isn’t necessarily, I don’t know if that’s the right way to say it, like this isn’t, like we’re not selling candy bars here, like it’s different, if you’re selling candy bars, everybody can know who bought it, when they bought it, it doesn’t matter, right? Like we’re one of the fundamental core systems of a civilization, like the healthcare system. And so we have to really, like, there’s a lot of responsibility on our shoulders to figure out how to do what we do responsibly and how to show that value, not only to our customers who might be paying us, to the end users and patients of our customers, and to the regulators who need to regulate, like we have to show value to everybody in that ecosystem, and it’s like daily. And if it’d be here has a best practice by all means email to me, ’cause I would wanna use it, but I don’t know, if there’s a one size fits all, it is a constant input into the decision-making process that we do on a daily basis. It’s like, well, like, how is this improving our goals? How is this aligned to our strategies? How does our customers, how does our end users, which can be different, right? How are they benefiting from us exposing this or not exposing this? And I think the pandemic has made it very clear that people want transparency now more than ever, like, what are you doing? And more importantly why are you what you’re doing with my data or with this data, and how does it, how does it fit in? And what benefit do I get from it? And so these are the things that we do in minutes insight all the time, you know, we have a great security, security operations teams that I partner with from an engineering perspective, a great legal team, and we’re always kind of asking ourselves these challenging questions, and also working with security and legal teams of our customers, ’cause they also you know, they’re showing up with like, here are the terms of how you can interact with our data, and we need that, so there’s a lot of cooks, but I think that there should be like, some people get upset about the bureaucracy, but I’m like, well, this is important. So it’s gonna take a while for us to get there.

– Exactly, exactly. These are extremely hard and extremely challenging problems that have many stakeholders, and I always think it’s very interesting to just see how many individuals come into play in this to your point, you all work with a large number, a number of organizations, whether it’s your clients, whether it’s regulators, whether it’s ultimately patients, and I think that ultimately winds up coming across for many of us that are in this space. You know Joe, I’m kind of curious on your side at AstraZeneca, how have you seen this come about, especially, I know you all are looking very heavily right now, at how you increase collaboration, both inside of AstraZeneca, and with organizations outside of it. I’d love to hear a bit more about that and to kind of hear your perspective.

– Yeah, I think Iyibo and Arek said it great, I think it’s attention, quite honestly, and it’s that tension that I’m proud of our organization, I’ve taking a very restrictive mindset first, and forcing people then to walk back from that restrictive mindset, will tend to protect our data and even departments within our own company are restricted from using a different departments company, because of you and the perception of a misaligned use. And we really fight hard to open up those doors, and I think that’s some of the tension, ’cause we’re leaving value on the table, to be honest with you, for the sake of making sure we’re compliant and making sure we’re doing things ethically, making sure we’re doing things in a way that we can stand behind, and there’s no chance of perception of wrongdoing. I think as we start to enter into wanting to partner with companies, like Arek may represent around, how can we help with maybe reducing readmission rates, reducing real problems in the healthcare system? You know, we’re very much gonna come from a mindset of, we have to earn the trust, it’s not inherently there. We’re obviously in different sectors, and we need to make sure all of that information is walked in together and very, very much held. You’re a broker and organization in the middle because we believe it, right? These patients in trust organizations that treat them and they need to make sure that data is safe and protected, and we have no interest in doing anything to break that trust, once trust is broken, it’s certainly not gained back quickly. So I think I’m proud, I would also agree, I don’t think I have a best practice, we fight with it every day, we have the conversation every day about should we have a little more access, there’s value here, we can do good things and even sometimes, good things that we’d love to do, but it’s a tension, and I think it’s a good tension to be honest with you.

– No, I think it’s a really good point, and you know, one thing though that I will say, and I think this is a great thing for us to touch to, as we get to the end of our conversation today. One of the great things that you were bringing up there is ultimately the potential that’s driving us forward is the ability to bring in data from multiple stakeholders, whether it’s a hospital system, whether it is the pharma company, whether it’s digital patient engagement data, and use that to solve problems that we can’t solve today. I’m kind of curious for you, when you look at some of the big challenges that you’re looking at the next one to three to five years, what would you say are gonna be the big the biggest things that you’re looking to solve by blending multiple data sets together and collaborating on data both with stakeholders, internally stakeholders externally, where do you see that going for you, Joe?

– Yeah, I think for me, it the rise of the ability to get some of this compute, we’ve touched on, and really get to some data that was too large to deal with was just problems that we couldn’t solve, we’ve got a ton of work that’s happening, our research and development in this space, but in commercial, really, we’re hoping to find some of these partnerships, and really bring some of the expertise we can bring out our disease states the therapies and help treat patients, find those patients that are undertreated, that may be able to be treated differently and help have a different conversation. And hopefully just raise some awareness of improving patient’s lives. I mean, there is a ton of available resources now for us to really make a change with the data and make some informed decisions, or at least guide those decisions, we’ll never ever believe that we have all the information to make the decision but help people maybe, question, a decision, think differently, challenged a little bit of a decision and in healthcare, will be a great outcome for us, and there are problems that we want to partner and solve.

– That makes a ton of sense, and I know Arek, being at Providence, Providence is an interesting organization, it’s extremely collaborative, you have a large research, kind of a large research and a large data science practice inside of the hospital system. What do you see on your side as some of those big areas that you all might be looking to do new data blending, kind of learning from some of the lessons of the pandemic around data recency, and data availability. What kind of showing up on your future looking horizon there?

– Yeah, I would say that more and more, we’re looking at sharing data with, from our external partners and different vendors. This is something that we try to do in house, I would say in the past, of course, we have share data externally for compliance for instance, but not so much to add to our data sets. And I think what we need more and more is those external data sets to enrich our own internal ones. So we’re definitely looking in that direction, I think what our other panel has said, they deal with other data sets from other vendors, they are our vendors. So, basically they would like access to our data, so what we’re trying to understand better is how do we share the data then , to AstraZeneca for instance, and how do we collaborate better, again, with the prescribed parameters of key partners, PHI data and working with those, how do we, de-identify the data that we share, but at the same time, get some benefit from the back in our healthcare system. So I would say it’s enriching the data, blending the data and then sharing data, I’d say these are the next things for us.

– Yeah, that makes a ton of sense, and I know, we’ve seen some very exciting partnerships between major hospital systems, large AI groups and other other healthcare research teams that have driven some really interesting outcomes for patients. So I would definitely believe that, and it’s definitely been interesting for me, when I look at a lot of the work that we do in the farm industry, farm has been very quick to uptake, I think to the point that Joe has had, pharma often gets a lot of data from hospital systems, or from health insurers in the form of real world evidence, in the forms of claims that they’ve bought but we haven’t seen necessarily as much of that flowing back into the hospitals, and I think that’ll be a very exciting space to see over the next few years. I think, going over to Iyibo, I’d be curious again on your side, kind of where you see that. I mean, one of the things that I could easily imagine is, one of the great things that having more external data sets allows us to do is, it does allow us to evaluate how models perform on different populations that are different from the populations that we’re working with on a day by day basis. So I can see that as an Avenue for some of the things that we had talked about earlier, but I’d love to hear your perspective, where do you see kind of the art of blending data, and the art of using it to drive better outcomes going over the next three to five years?

– Oh, well, I think that’s the name of the game, like I said, like I mentioned earlier, the diversity of the data that we are gonna be responsible for is gonna grow by leaps and bounds, and the value that we have to drive to, like to what I said previously to our customers and into the patients, and the members is we have this vast amounts of data, they’re sharing it, like Providence is sharing their data with us, and they’re expecting to get a return, they’re expecting to get additional value added information from us. And so it is our job to always keep that top of mind to build that trust. They’re giving us something ’cause they’re expecting what they gave us back and then a little extra. And when we were looking at surveys and different ML, VHR data, and just trying to be like how are we going to put together some sentiment data, we start wanting to tackle some more social determinants, it’s like, all of these angles require massive amounts of data, data that quite honestly was we didn’t have the capability to handle three or four years ago. So you’d have to worry about it, it was easy. But now that that’s there, there is a treasure trove of value that we have the mind and I’m excited to do it, I think everybody here is excited for the challenge that we have to overcome, we have the bias issues that we have to overcome, we have the ethics that we need to always work, you don’t overcome ethics, you just always, it’s always evolving, but that’s what I see, I see, three to five years the diversity of data is gonna be 10 X more, what people expect you to do the transparency that they want, they wanna know what you’re doing with their data, it’s gonna go up, and the regulations are gonna increase and that’s all going to happen in the next three to five years.

– It makes a ton of sense, I think there is a really interesting comment that you made a bit earlier today, where you’re talking about the opportunity to train, instead of having kind of one model to rule them all in healthcare, you have the ability to train many smaller models that attack these small parts of the issue. And when we blend all of these different data sets together, when we look at all of the different fundamentals, whether it is social determinants of health, whether it is where a patient is located, whether it is what their condition is, and how they are interacting with information about, you know kind of how they are getting digital information about their condition, about the therapeutics that are available to them, we blend all of those together, and we work on each of those problems piece by piece, bit by bit. We can ultimately improve the value of healthcare for everyone and hopefully drive better outcomes for our patients. So it was really exciting to talk with you all today, it’s always wonderful to have a panel like this that blends different pockets of the healthcare ecosystem and the healthcare value chain together. And it’s always really encouraging to me to see that we’re all pushing towards a single consistent goal with many of the same goals and approaches underlined. So thank you all very much for joining us today on our panel. I really appreciate, I really the opportunity to have a conversation with you all about these topics, and thank you all to our audience for joining us today.

– [Narrator] That was great, what a lot of really good insights, I want a big thank you to our panelists and to Frank for sharing those stories. And that’s all we have for today’s session, thanks again for everyone for joining us, and thanks to our speakers. Look forward to seeing you next time.

Watch more Data + AI sessions here
Try Databricks for free
« back
About Frank Austin Nothaft


Frank is the Technical Director for the Healthcare and Life Sciences vertical at Databricks. Prior to joining Databricks, Frank was a lead developer on the Big Data Genomics/ADAM and Toil projects at UC Berkeley, and worked at Broadcom Corporation on design automation techniques for industrial scale wireless communication chips. Frank holds a PhD and Masters of Science in Computer Science from UC Berkeley, and a Bachelor’s of Science with Honors in Electrical Engineering from Stanford University.