
Improving Hospital Operations with Clinical Analytics and ML in the Cloud
Introduction
Want to watch instead of read? Check out the video here.
Mike Ortega:
Thank you for joining us today to discuss improving hospital operations with clinical data analytics and ML in the cloud.
Joining us from Integris Health is Julius Abate, director of data engineering and automation, leading the data platforms over there and many of the cool use cases that he’ll be sharing. Along with Julius is Ankur Teredesai, co-founder and CTO at KenSci and Frank Nothaft, technical director for healthcare and life sciences at Databricks, sharing their real world experiences working with customers like Julius on how to implement some of these solutions.
With that I’m going to pass it off to our first speaker today, Frank Nothaft.
Personalize patient experience and improve operations with Databricks
Frank Nothaft:
Hi, all. Pleasure to be chatting with you all today. I’m Frank Nothaft and as Micheal mentioned, I’m the technical director for Databricks in healthcare and life sciences, so I run our global healthcare and life sciences practice. First, for those of you who are unfamiliar with the company, Databricks is a cloud platform that provides a unified data analytics platform that allows teams to collaborate together across data engineering, data science and business analytics. Our team was co-founded in 2013 by the original creators of the Apache Spark software project at the UC Berkeley AMPLab. After the Apache Spark project went open source, our team decided to build out a platform that would make it easier for companies to leverage the power of Apache Spark in a high performance, easy to use, cost efficient and secure cloud platform. Since 2013, we’ve grown and we now have over 5,000 customers and 450 partners that are using the platform, and we have done a lot of work to go ahead and build out an even better open source ecosystem to use Apache Spark.
We still contribute about 75% of the code that goes into the open source Apache Spark project. We’ve recently added the Delta Lake project, which adds a high performance cloud native data lake. This is an open source format for storing your data that gives acid consistency on cloud and other local storage systems. Then we add the MLflow project, an open source project that gives people an easy way to manage their machine learning life cycle by tracking and building hooks to push models into production really quickly.
As our company has grown, one of the big things that we’ve seen is a great opportunity for us to extend into the specialized industries and go a bit further to help customers be more active, more successful and see success more rapidly from their investment in the software. Back in 2017, we started to go ahead and build out our healthcare and life sciences vertical, with a big focus on making it so that these teams that were working on very domain specific problems in healthcare, in the life sciences, could stand up these problems, put these high performance machine learning and data engineering capabilities right in the hand of domain scientists so that they could really bring data to the people who are in the line of business, making an impact on patient care or how therapeutics were developed. Our practice has grown pretty dramatically. We work with seven of the top 10 pharma. We work with multiple Fortune 10 healthcare companies. We have a large practice in the care provider space, and we’ll dive in on some of the use cases that all of these organizations are seeing.
When we look at the current state of healthcare, we’re in this fascinating time with a massive transition going on. We look at the consolidation in the industry where if you go into a single metropolitan area where historically you had many small community hospitals, you had several major public hospitals, you’ll now often see that many of these hospitals have consolidated down into regional chains where you’ll have maybe three major hospital chains that are working in a single area. At the same point in time, we have a lot of trends coming out from a data collection side, where now patients have unprecedented access to information about their care, information about their condition, we see a lot of providers standing up digital interfaces where a patient can go in, change their appointments, talk to their doctors directly through an application. This gives us both great access to data and more choice for patients in their care, the ability to go ahead and shape their experience.
Finally, when we look industry wide, we have this major change where while in the past we’ve been compensated for our care at the service, what did we do, we’re getting kind of paid out on the item of which service we provided to a patient, we’re now changing to a model where we’re going to be reimbursed on the value and the outcomes that our patients are experiencing. What we see is this is a great opportunity for us to go ahead and use large scale datasets that we’re collecting, whether it’s through our EMR, whether it’s through our digital outreach systems for our patients and to use that to personalize care and improve the outcomes that our patients are getting.
If we don’t go ahead and do this, there’s actually a very strong risk to us. If other care systems that we’re competing against do a better job of using this information to personalize care and improve outcomes, we’re likely to lose our patients to them. Even right now, about 20% of patients every single year are switching annually due to dissatisfaction with their current care providers.
When we look at what it takes, though, to go ahead and turn this big data into actual changes in clinical practice, in how we’re going ahead and doing patient recruitment, how we’re managing the patient care experience, there’s a lot of places where we need technology to go ahead and step up for us, but it’s hard for people to go ahead and do that. First, what we look at is scale is a really big problem. If we’re going to build a system that personalizes care at the patient level, we need to be able to pull together all of our data assets to build out a single view of each patient and we need to scale that to all the patients in our system. This means that we have to be able to take in EHR data, billing data, images, IOT data coming from beds, we have to unify that all together. This can be hundreds of terabytes, up to petabytes of data, which most care systems don’t have the capability to work with today. They may have a data warehouse that they can stand up, but they don’t have a great big data platform that can integrate all of this data together.
When we then look into the machine learning space, reproducibility is a big problem. If we look at a model, if I was going to take machine learning and I’m going to integrate that into clinical practice, I know that my doctors are going to want to understand how this model is built. Is this a model that I can trust? We know there are patients who are going to make the same demands of us and if a regulatory body steps in, they’re going to have really strong questions about how models were built, what they were trained on and how we can be sure that they work. A lot of the time, people don’t have a great strategy for making sure that the machine learning that they’re doing is reproducible, which winds up being a very strong gate that prevents them from rolling out machine learning in the whole care ecosystem.
Finally, security and privacy is a really big roadblock. When we look at moving data to the cloud, when we hear from CIOs inside of medical organizations, a lot of their concerns are immediately around making sure that they can meet their compliance requirements so that their stakeholders have trust in the way that they are managing their data and have trust in the use of cloud computing to make these data driven decisions.
Where the Databricks platform steps in to address this is, as a cloud native service, that winds up knocking down many of these barriers. First, if we look at cost efficient scale, we do a lot of work inside the Spark ecosystem and in the Delta Lake file format to make it really easy to go ahead and load many different data types in and process them at large scale. For instance, if we look at some of the projects that we’ve done, one of the very exciting projects that we’ve done at the Regeneron Pharmaceutical Organization is taking clinical data from the hospital systems that they work with. It’s taking genomic data from patients that they’ve sequenced and it integrates those together at the scale of more than a million patients. They’re able to store all of this data in Delta Lake. They’re able to make these tables accessible broadly to people. They’re able to compute on it and they’ve actually been able to take compute jobs that took three months to run and accelerate them down to two days. This is ultimately allowing them to build this very large, comprehensive view of a patient across all of the clinical phenotypes they’re interested in, all of these various different orthogonal data assets that they have so they can get really good insight into how genetic disease goes ahead and manifests.
When we look at reproducibility, our machine learning library management tool MLflow is an open source tool that allows you to go ahead and track the full chain of custody, all the data, all the lineage that has gone into a machine learning model so that you can understand everything that went into that. You can lock that model in a reproducible manner and go ahead and share it out broadly. When we look at organizations like Optum, they’ve been able to go ahead and use this to scale machine learning to large data science teams of 80 different data scientists working on challenging clinical models where they’re trying to do things like predict the conditions that will lead to a disease so that they can do better things around care personalization and suggest preventative care pathways.
Finally, when we look at the way that we run in the cloud, we spend a lot of time and effort to make sure that our cloud service is very secure, it is compliant with regulations such as HIPAA and on Azure, we are high trust CSF certified. We have a deep support built into the product, into the platform, that allows you to define access controls at every layer of product functionality. This allows you to be confident that you can go ahead and deploy this system on large amounts of PHI while meeting your internal compliance needs and while being able to secure data assets at a very fine granularity.
Ultimately when this comes together, this is kind of how you can think of it. You’ll have all of your data sitting in your cloud’s storage. We deploy directly into your cloud, so our enterprise cloud service runs in your account. It automatically scales up and scales down computational resources as you’re computing. It gives you easy access through a UI to spin up computational clusters so that your data scientists, your data analysts don’t have to learn all about how to go into a cloud console and spin things up. It’s a very easy to use form factor. Our unified data service is an optimized version of Apache Spark coupled with a managed Delta Lake service that gives very high performance by integrating scale out computing with an efficient caching and cloud IO layer. Then we have a data science work space that provides a managed notebook, so it’s like Jupyter Notebooks, running in the cloud with a high level of access control with deep collaboration. We have version control and the ability to share notebooks built in, so you can easily share notebooks between teams and understand how people have gone ahead and put their analyses together. Ultimately this allows you to very rapidly move forward on analytics projects that ultimately improve patient outcomes, reduce your cost of care and increase patient retention.
Ultimately with this platform, we have a lot of capabilities that allow you to personalize the patient experience and we see customers using this across population health to go ahead and understand social factors and disease comorbidities that impact their care. We see a lot of people using this for advanced workflows in genomics, advanced workflows in medical imaging and in medical IOT. Then we see a lot of work in the patient engagement space to make better models for operational aspects of how we’re running our hospital that can improve or ultimately actually cause a change in the patient experience. With that, we’ll go ahead and change over to the KenSci team. KenSci has a really great platform that provides a lot of work in the population health and the care management space, so I’ll go ahead and turn it over to Ankur Teredesai, the CTO of KenSci, who’ll take you through the KenSci product and help you understand how that fits into your strategy.
KenSci consolidates healthcare data to enable collaboration across data teams
Ankur Teredesai:
Thank you, Frank, for that wonderful introduction to Databricks and the value proposition that has streamlined across so many customers and delivered great value. From a KenSci perspective, KenSci is a predictive system of intelligence. We sit somewhere in between the systems of record and systems of engagement. The main value that KenSci brings to Julius and Integris and many customers across the US and globally is the idea that healthcare is so nuanced. It’s highly regulated and to create a machine learning driven insight delivering platform, one needs many components that are hardened just for healthcare. We started our journey back in 2015. We are a spin-out from the University of Washington, where I spent a decade or more doing fundamental research on AI models for radius, for wider and [better 00:15:10] use cases, and taking all those models on the cloud, we founded KenSci in 2015. Today KenSci works across almost four different continents and most of the large healthcare organizations in the US are KenSci customers.
KenSci is enabling an active system of engagement and interaction by having an AI driven system of intelligence across care management, operational efficiencies and cost and utilization prediction models.
The main value that KenSci and Databricks are driving together is the idea of KenSci as an accelerator to the value that you get from Databricks as a platform, across three main pivots, helping easily ingest, experiment, train and collaborate on the cloud. So KenSci offers pre-built data connectors for most leading EMRs. We help with ingestion of claims and Rx norm data, supply chain data. What we have created is a common data catalog for healthcare data, such that when you connect to the sources and start bringing that data in for future construction and machine learning, the data transformation process is streamlined in such a way that all the operational nuances and triggers that need to be watched out for, for exception handling, for data quality, are all taken care of by the KenSci platform that sits on top of the cloud with Databricks. We also offer shared notebooks and authoring experience within KenSci that Julius will talk about later in the webinar.
On the second pivot, the whole idea of having a complex data infrastructure that is hard to manage and scale out for healthcare organizations is something that we have spent significant time and energy developing. Today, with KenSci, healthcare organizations such as Integris can, with one click, launch an auto-scaling mechanism where, depending on the volume and the velocity of data, Integris-like organizations can handle the data workloads seamlessly. Some of the internal features and the reason why many healthcare customers choose KenSci are the operationalization of healthcare data through data health report, leveraging Databricks Delta to ensure that quality control and validation of the incoming is nuanced for healthcare and just understanding that level of granularity that comes with healthcare data, which is very useful when you are trying to operationalize many machine learning models together. Last, but not least, in that pivot, is the enterprise grade security and compliance controls that we have put in place, making the entire KenSci experience HIPAA compliant and GDPR compliant in a role-based setting. You can go down role-based access control of the data seamlessly using the KenSci infrastructure.
Coming to the third pivot on improving patient care outcomes and value to ROI, KenSci today has 17 plus use cases supported across hundreds of machine learning models that run seamlessly, friction free, within the KenSci ecosystem. Each of these models, which we call model templates and accelerators, are validated by subject matter experts, including clinicians, in order to ensure that the feature vectors that are following these machine learning models are governed appropriately and a map to the understanding of a clinician SME. KenSci also provides something very unique out of the box, which we call our variation analysis and fast track path to drive quick value in ROI. What this feature of the product does is it helps healthcare customers quickly pump data through KenSci using Databricks and it powers up an entire variation analysis dashboard out of the box so that C-level executives within the healthcare organization and their teams can really start seeing where the analytics opportunities are from an optimization perspective for the whole system. We can go into that detail at another time. I’m happy to share some thoughts on that.
Here is a 360 view, a top level view, of the runtime platform that KenSci has developed. Within KenSci, very early on, we realized that building one machine learning model is good. It’s going to become easier with two slight data breaks, but keeping the entire pipeline operational from prep to model development to model production for healthcare systems is still an onerous task. We wanted to really liberate fantastic data engineers and data scientists like Julius to do their job, innovate that they focus most of their time and energy on use cases and business users that they’re interacting with daily, hence, the idea of a KenSci runtime infrastructure set of services where it automatically handles deployment, monitoring, centralized telemetry for audit logging and compliance reasons. It’s all powered on Azure on Databricks, and then the data prep, model development and production level pipelining is all taken care of by KenSci through packages and through libraries that KenSci provides. On top of that is the KenSci runtime portal, which allows for administrative management, visualization, and analytics off your reports.
It allows you granular user level access control and the best part of it is it will integrate with your organization’s single sign-on capabilities so that the organization can frictionless move into the cloud and manage all the data assets reports as well as the analytics insights without having to change any of their security protocols or access control changes. Last, but not the least, everything within the KenSci platform is available as an API that can be embedded back into the workflows and KenSci takes on the guarantees of making sure that those APIs meet the SLAs and the uptime requirements of our customers who are serving patients and those in need. The requirements can be quite stringent, but so far we’ve been quite successful in ensuring that each of these requirements and SLAs are met.
To quickly go through the components of what KenSci’s adding to large healthcare organizations like Integris here, first off we provide a managed infrastructure service. Everything from rapid deployment to end to end infrastructure management is provided through KenSci.
Next is a managed data ingestion process, so right from event level data to customer DMRs to the enterprise data warehouse. We have developed connectors and data ingestion pipelines so that even raw HL7s can be processed through a tool, which we call as KenSci Agent Ken, and that allows rapid data movement and transformation on the fly from source all the way to the cloud through the KenSci Agent Ken integration platform. This service can run on virtual machines on prem or it can run within the cloud, depending on each customer’s custom infrastructure requirements. What it basically does is it moves your data from the enterprise data warehouse asset that the customer may have in house or directly from the customer electronic medical record or any other source into the KenSci platform on the cloud.
Another very important feature that we realized very early on is the ability of hundreds of data scientists to leverage the work of each data scientist as they’re working on shaping and enabling machine learning workloads. As we all know, 90 to 95% of the work in machine learning is not really building models, but exploring data, shaping data, creating the features that then become valuable in the prediction models. With the KenSci feature bank capability, Julius can share the features that he’s creating for one model and reuse them in other models that he’s building, but at the same time, he can share the same feature with four other data scientists within Integris and on his team so that they don’t have to recreate those features, and we’ve made this extremely easy to discover through the platform packages feature that’s available within the KenSci machine learning libraries.
You’ll see here on the screen the Databricks environment, so everything is in a Databricks notebook. It’s using very familiar tools for model training and what it allows the end user to do is essentially using the KenSci packages and the machine learning models, one can import the model and the capabilities that are available through the KenSci accelerator into the Databricks notebook and then run the entire pipeline for training as an experiment with the Databricks notebooks environment. Again, keeping the tools and the amazing, powerful capabilities that Databricks is providing, but adding that value accelerator for the healthcare area use cases that teams like Julius’.
Once the model is built, and not just one model but maybe an entire pipeline of cascades of machine learning models is created, you need a place where the organization can reliably depend on these models getting scored time and again, thousands of times, every day, in order to then produce the ROI that most healthcare organizations need. With the KenSci runtime, or the KenSci model scoring pipeline, with a few lines of code, you can essentially orchestrate that entire pipeline within Azure using Databricks from mounting data all the way to cleaning up to running the scoring engine and ensuring that hundreds of models that are required for multiple use cases are firing seamlessly and generating the outputs that then go into visualization tools or workflows moving back into the EMR if integration is needed over there. This is a very robust model scoring pipeline. It uses Azure data factory end to end orchestration in order to then enable the seamless integration within workflows.
In healthcare, model monitoring and telemetry is extremely important. To give an example, and I’m sure Julius will talk about it in a few minutes, but let’s say there is a prediction model for length of stay of patients in a large, general ward of a hospital. To make sure that the pipeline is up and running and producing the right set of results so that hospital operations can run seamlessly, it’s very important to make sure the quality of the model output is consistent with what we are seeing in the data and what was the eventual accuracy or precision recall that was put in place when the model was standardized and put into production. The model health monitoring tool within the KenSci model lifecycle will allow folks like Julius to then look at this on an every day or every hour basis and understand and see the telemetry outputs at any given point of time to ensure that the model performance is not dropping below a certain threshold. In fact, there are mechanisms and alerting systems built into the KenSci platform in a way where, if the model performance was to down below a certain level of acceptance or threshold, automatic triggers can then inform Julius in order to then go back and investigate why a length of stay model is non-performing and if retraining or data quality issues are hampering that model performance.
As you can see all the way from a one click managed infrastructure service to data ingestion that is made frictionless and seamless, allowing capabilities such as features, which are the most time consuming parts of any data science process to be shared across the organization and globally, enabling model training to happen within the Databricks environment such that the simplicity of that tool is available to multiple data scientists to share and work together, running an entire pipeline, for scoring hundreds of models seamlessly, all the way to telemetry of these models and their performance benchmarks is all provided within one platform, which we offer as the KenSci platform on top of the Databricks environment. That’s sort of the overall value proposition. I would like to pass on to Julius to go through and show us how building and scaling this for innovation at Integris has been enabled. Julius, over to you. Take it on.
Building and scaling innovation for clinical teams at Integris
Julius Abate:
Thank you, Ankur, and thank you, Frank, as well, both for those presentations and for your contributions to these products that our foundation is now built on. I’m Julius Abate. I’m the director of data engineering for Integris Health. Been with the organization two years as of this week now. As you can see, we’re the largest not for profit organization in Oklahoma. I believe we’re currently around 10,000 employees. We serve a patient population of two million. About a little over a million of them are active. We have a nationally recognized transplant center and are a Center of Excellence for cardiovascular care as well in the state. Currently our data engineering team is focused on process improvement through delivering AI and rapid feedback on process compliance directly in the clinical and business workflows.
When I first came in under our new CIO, Dr. Ben Mansalis, we had to reckon with the current state of analytics, which had grown over time with the organization.
Where we started was over time, for a lot of folks who’ve seen this, you end up with multiple teams working on differing systems. We had specialists in SaaS working off an Oracle database who did not write in SQL. We had some specialists working in Oracle SQL who didn’t understand the SaaS portion or the automation. We had, of course, a group reporting out of EPIC after our go live, who worked entirely in the EMR and then we had some folks reporting off of Clarity, that reporting server, and very little did they talk, communicate, or was there a lot of shared knowledge and integration. There was also a heavy amount of manual processes and single threaded processes. A number of people didn’t cover areas, they covered specific reports. If someone went on vacation, these reports wouldn’t get done for a service line or for a clinic system, and our version of data integration was two different teams would run exports and then there’d be an analyst in charge of merging those Excel sheets. I’m sure it’s a story many have heard a number of times at a number of different locations.
On the occasions when customers did go and request the same report from differing teams, there was inconsistency in what was reported to those teams. Over time metrics definitions shift and without proper stewardship, different teams report different items. A great example is I could request a length of stay report from two groups. One might give me end to end length of stay of a patient from time-in to time-out. One might take that as patient days, counting midnights, while they were here. Gives me two different numbers and are used for two very different reasons. Even within a single team, the lack of any kind of analytic change control meant that within hundreds of managed, whether it be via Crystal reporting services, whether it was those queries someone kept in a shared drive somewhere and ran manually once a month, items were not updated when deprecations occurred or a definition was changed. The change did not propagate until a mistake was noticed, if it was even noticed. This all led to lack of trust of the system in data and a heavy amount of time spent on sourcing the data, manually putting it together, not enough time spent on the actual analysis, which takes us to the first steps we made.
I’m here representing our clinical and business intelligence wing, which said first, “Allow us to centralize all reporting systems.” We brought all the teams together and said, “We’re going to work off a single platform.” The plan was to deploy SQL server instances throughout the organization in a dev test and prod format. For those of you who are a little more technical, we’re talking about some beefy machines with a couple hundred gigs of RAM, multiple terabyte hard drives and 16 cores. All the toys we wanted to play with. Then my background being in analysis services and working in an organization that was comfortable with pivots and comfortable with the general Microsoft ecosystem, Power BI and analysis services models were our method of deployment. It also worked out well because it is incredibly friendly, two not for profit organizations, so made not just the adoption easy as far as being a pivot table tool, which anyone who had worked in Power Pivot before would understand the basics of, but also an easy one to sell as an emerging department saying, “Hey, we want to try something new. We want to revamp any disruptors in the area analytics here.”
The other piece was merging the silos, both of skills and of data, so we run cross-functional agile teams which have engineers, scientists, analysts, report writers all working on the same projects and in the same systems. We have analysts who want to work hands-on with our pipelines who work in the engineering pieces with engineers, who are incredibly excited by the end data they’re delivering and diving into it and asking those questions, not just in supporting infrastructure of the team. We also brought in the idea of project scoping and analysis. We wanted to limit scope creep and increase the effectiveness of a report or a project, even if it meant pushing back on a customer somewhat. Our team also took on the obligation of heading out to customers and understanding not what it was they were asking from us, but what their end result was. If it was an improvement, could we find a better way? Could we think outside the box for them? Then of course, one of the most important things, which was the consolidation of definition around metrics and measurements in the system. We wanted to prevent drift and create an actual meta-data dictionary which could be utilized by folks and save our team time.
We knew that we had to unite the system under one mission. We work under the spoke and hub model with our agile teams, which there are subject matter expert teams spun up around the system to help find areas of opportunity, utilize the analytics we were putting out to assess where the biggest value was in improvement, either in population health, in reduction in mortalities, readmissions, or in reducing costs in some area and help work with these teams to implement and to socialize it throughout the system, increase adoption. These teams are also spun up to act as the key voice on stewardship, the deciding factor on when length of stay is displayed throughout a system. What rate does that include? What DRGs does it include? What types of cases are included, so someone can know, when looking at three different reports and three different versions, what the underlying metric is with confidence, and also create an incredibly high demand for analytic tools.
Integris was just starting on the journey of becoming a data driven organization, so as we showed what could be produced, what the low barriers to entry were to interacting with data to utilizing tools, there became an incredibly high demand and that spun off some incredible projects.
We worked to take the organization from the focus of lagging metrics, waiting on quality data from CMS, waiting on data warehouse loads until a month after billing ends, seeing data and thinking a 30 day delay was not too bad to “Let’s look at these leading metrics. Let’s look at your performance yesterday.” A big way that we showed that was in socially engaged analytics. We began identifying processes utilizing those subject matter expert teams around the system, which process we thought influenced major issues in the hospital. What you’re seeing here is one of the first we undertook, which was piloting the use of a neonatal sepsis risk calculator throughout the system in order to reduce the numbers of babies incorrectly sent to the NICU based on potentially faulty or inaccurate testing. What you see on the bottom left is when we engaged neurosurgical providers in rapid feedback on what their MME prescriptions were, so morphine equivalency prescriptions were, tracking opioids, to patients who were opioid naïve, not having an opioid in the last six months.
What you’ll see here is rapid improvement. This incredible vertical line you see is the minute you started showing them “Here is how you did. Here is how you can iterate, too, make rapid change, assess what occurred yesterday and say, ‘What do I need to do differently in order to hit that mark?'”, and while we know we’re looking at something that went from 12% compliance to 26, we’re talking about more than doubling the rate of compliance with that process in a very short amount of time. As for the neonatal sepsis risk calculator, it was a process that went so well that by the end, we were able to stop social engagement because we were hitting 90 to 100% in all cases through all departments. Then the issues came.
Just as we were kind of hitting our mark as we had those amazing use cases to show the system, we had the support of incredible clinical teams behind us looking to pilot more, we hit our ceiling on the technology.
As we say here, timing is everything. We are a 24 hour system, as all hospitals are, and not just that, but our leadership is up early in the morning. We run tiered huddles. Starting at 8:00, teams are huddling together, discussing issues, evaluating all the information they can to pass it up the chain, whether it’s which patients just readmitted yesterday and do I need to send care planning teams out? What was our revenue versus budget yesterday? How many surgical admissions did we have? What was our vacancy rate yesterday? Those won’t get passed out, but those need to be ready early in the morning. Some of that wonderful work we did around process engagement required some heavy overhead, even for the machines we had. Some required traversing notes, using wild card searches through what we consider incredibly large datasets of over a million notes to find specific key phrases that were non-standardized, and then going through that entire process, migrating the data out of the reporting databases and systems, doing the transformation on it, loading it into analysis services model, fully processing that model, just added more steps at which a potential hiccup could occur.
What we ran into was even these wonderful servers we’d set up and thought were just the bees’ knees getting absolutely crushed by the daily workload. Whether it was memory blowing up or the CPU struggling to compute these queries, we were consistently beginning to fall behind on our timings. We were having some processes time out altogether, and we were creating uncertainty among customers. We were running the risk of losing trust from these folks who’d put a lot of faith in us and who we really wanted to make some excellent changes for.
That was when, talking with KenSci, we said, “It’s time to think in the cloud, think outside the physical box.” We had our eyes opened to the newest tools and playing, though these Spark clusters were life-changing for our entire team as far gaining backbone up to us with the Databricks environment. Just incredible speeds. The idea of scaling and feeling that, “Hey, I’m only using the resources I need,” because there is that worry at first with the cloud of you’re paying for tools by the hour, it’s possible you haven’t thought in those terms in a while, have the ability to use as much or as little we require for the job at hand and given the mix of our team as I said earlier, the support for multiple language was incredible as far as lowering barriers to entry. We had BI developers able to jump in and begin building in Databricks right away because Spark SQL is so close to T-SQL in so many ways in the syntax. For folks who came from the data science background and who were looking to contribute more in that area, they had the ability to use Python, use those packages and really do the work they went to school and was their goal to do. The most important thing in all of this was just the speed. 10 times may be underselling it in net processes, which took multiple hours to load in the morning, to calculate for a year or two of process metrics were done in 10 minutes. We became spoiled.
That and then the idea of the multiple data sources attached to it. First we were introduced to the blob storage and then quickly KenSci introduced us to the Delta Lake. That is now the basis for our entire cloud data lake. The ability to fast up starts and merges, so those differential loads of data on a daily basis from all the differing systems from the file ETLs we have to consume. It made it incredibly easy to track over time and to ensure data fidelity was big for us in creating trust, making sure we had accurate rows on premises and in the cloud so we knew that we could run the same reports from either location with complete confidence. Then all that was bridged together by the introduction of Agent Ken, which Ankur spoke of, which is absolutely key to our quick adoption of Databricks, the ability to easily have all our sources dropped up into the cloud and just, as he said, liberated us to play with it.
This is, as we moved from the idea of data warehousing to the idea of schema warehousing, so we have data available within 90 minutes, sometimes less, of when Agent Ken first kicks off, and rather than do an ETL, we just do an extract and load. We keep all data up in our data lake in its native format, no transformations, no pre-joins. Rather than having to do backloads, having to do classic data warehousing with 100 packages, all of our change control is built into schemas that we utilize. We don’t query the underlying data unless we have to. Instead we query the Us built on top of the data with predefined logic in them, and that allows us to reference the same logic in any number of reports, any number of data models, any number of features, and make the change in only one location.
I spoke about speed before. Just the rate at which we moved was incredible. Items which before were run a query, have a cup of coffee, go to lunch, come back, maybe it’ll be done and I really hope there were no errors and I have to run it again were now done in minutes. We were able to iterate the pace of our creativity and of our learning. The sooner you learn that there was an issue with something, to be changed, to be propagated end to end from schema all the way into a data model within the space of 30 to 45 minutes. Then the built Spark connector in Power BI allowed us to easily translate our existing format of delivery through self-service data models to simply point at our Databricks hive instances and source the data from there that we materialize.
Then that brought us to the place we are now, with incredible confidence from the organization and the ability to do the work we want to do, with the ability to move quickly and deliver to all areas. We’ve moved from the point of barely having data in time, often late if it did get loaded, to driving process improvement throughout the system and being able to go out and drive the adoption, most of all, of self-service reporting. It’s very hard to build an analytics team the size of any system we’ll need, so putting data in the users’ hands in a simplified, usable format with clear structure and clear definitions expands capacity and expands thinking. Putting data in the hands of the experts allows them to ask questions rather than having to jump through us and allows us to focus our time on these projects you see here. We went from neonatal sepsis and opioid orders in the neurosurgical realm to we’re now working to drive population health through inreach and outreach, tracking screening rates among populations, marking who was coming in in need of screenings and delivering medical assistance near real time feedback.
For every patient who came in needing these screenings, did you get them that order done, and then asking the call center, of the patients who came in and had orders in the last three months, have we called them and got them to complete those orders, complete those fit kits, get their mammograms done for the benefit of their health? Then from the process engagement of the hospital we said, “Let’s attack the big issues. Let’s try and reduce falls. Let’s work to reduce pressure ulcer rates,” and so we began social engagement around two hour turns of patients, process compliance among the nurses and hourly rounding to reduce the rate of falls within the hospital. What we found, our fear going in was that as we spread wider, we would become that department, become those people who came in, add more work onto you, but it turns out that what people wanted was feedback. Letting them know how they did the previous day excited folks. When we did rounding, we saw teams out there who huddled around the data, had questions for us, were excited to know how the process worked in the background, to know if any member of the care team checked in on the patient, did they all get credit? Could they work the team together?
Now that we have that infrastructure, we’re working towards adopting the KenSci platform as a way to move beyond the lagging metrics, the leading metrics and into the predictive metrics as part of the workflow and build that same confidence in that system in AI that they now have in our analytics. I want to thank you both for the time, for these tools that we use on a day in, day out basis and that’s it. We can go to the close.
Mike Ortega:
Awesome, Julius. Thank you.
Julius Abate:
I’ll share this. This quote is attributed to Abraham Lincoln. “Give me six hours to chop down a tree and I’ll spend the first four sharpening the ax.” Databricks and Agent Ken were our whetstone for sharpening the ax to allow us to go out there. Thank you.
Q&A
Mike Ortega:
Awesome. Thank you for sharing your story and all the great work you guys are doing, improving the operations throughout the hospital and some of the great successes you had. Lot of great stories today, and what I’d like to do now and thank you, by the way, to all our speakers, Frank and Ankur, for sharing how the Databricks platform along with the enablers that KenSci placed on top can really allow hospitals and healthcare systems to do the same kind of work that Julius has done.
How did the Integris team get up to speed on all the new tooling? What was the process there to adopt some of these technologies?
Julius Abate:
As I said, a big part of it was both the similarity between Spark SQL and T-SQL, so to be able to take their existing knowledge as report writers, developers, analysts and drop it into a new environment and be able to be told, “Listen, it’s okay to put some blinders on. You don’t have to worry about all the additional features you could use right now. You can still write queries. You can still create views. You can still build logic in a format you’re very familiar with.” That was a big part. Then the other part was having Agent Ken and being able to say, “Hey, the data is here. The data is ready,” so the biggest hurdle might have been, or the biggest teaching moment, was just understanding about the way we build models now. Here’s the PBIX, here’s the connector, here’s how this dataset will work and how we will connect to its source data. It was actually, while there were a lot of capabilities, a lot of things we can do with Databricks, some of which are beyond me right now and I’m still learning, some of which folks put months into, the actual day to day of our work and the translation of it was very quick.
Mike Ortega:
Awesome. Now that you have access to real time data and are bringing in a bunch of different data sources, from HR systems, operational systems and certainly predictive analytics, what are some of the cutting edge use cases you’re starting to see, or whether you’re starting to see value now that you have that realtime data and predictive capabilities?
Julius Abate:
The first one we did with the realtime was not just utilizing the last day feature, the reporting databases, but bringing in live data through APIs or HL-7s in order to do staffing efficiency both on clinical floors and in operating rooms in realtime, so give managers feedback across different floors, different departments as to who’s considered over or understaffed right now. Where can we move around personnel? We’re looking to try and pilot maximized efficiencies throughout our flagship hospital. That’s one of the biggest ones. And then the other has been spinning up the base of just self-service that folks have not had before, making some very detailed analytic tools and specialized models that are allowing them to answer questions on their own, answer questions at a faster pace and answer questions about what happened recently that before were just on their mind. The questions are actually still flowing in as people realize the availability of the data, so we’re still learning where people’s minds are moving with it.
Mike Ortega:
Ankur and Frank, you obviously work with a lot of different healthcare systems. What are some of the emerging use cases you’re seeing that are adding immediate value around having access to real time data?
Ankur Teredesai:
Excellent question. I sometimes equate the availability of this level of granularity of data all integrated through Databricks and KenSci platform as a gold mine that can now be harvested slowly and with intention. Once this basic pipeline of personalizing the machine learning flow and insights is available, folks have started to really realize value across many different use cases. Particularly exciting for me are the patient flow use cases that predict early arrival, census, leading to length of stay management within general wards. There is a huge interest in even estimating step up, step downs from general ward to ICU, ICU back to general ward and then bounce back issues, so on the wider side, there is a lot of interest in just being able to capture the state of operations within a health system and multiple use cases like this can be enabled. Another thing that we are seeing is discharge planning. Quite sophisticated discharge planning from discharge disposition to predicting patients who are likely to have short stays, which is very burdensome to the health system in terms of both revenue as well as staffing, as Julius already mentioned. That whole suite is really coming alive and I’m noticing our customer partners asking for more and more accelerators that KenSci has built in order to then customize those models and put it into the pipeline for endpoint APIs or backward integration into the EMR.
Frank Nothaft:
Yeah. I think Ankur and Julius have provided a lot of valuable nuggets there, so I’ll be brief, but one of the things that’s really excited me and I’ve seen a lot of people get really excited about themselves is places were able to use this live, real time streaming data to make some prediction that allows people to pay more attention to a patient that is at a marginal situation where the course of treatment that they get over the next minutes to hours could have an impact in the severity of their condition increasing, decreasing, anything like that. We’ve seen a tremendous amount of innovative things that people are doing in that space and I think it’s just a really, really important area for us all to be looking at.
Mike Ortega:
Frank, how does Delta Lake and open source technology play a role in those real time use cases?
Frank Nothaft:
Great question, great question. I think if you think about the data that we’re working with, it oftentimes is fundamentally coming in a streaming fashion, whether it’s getting collected out of a change data capture pipeline from a cable back in the HR or whether we’re working with a live streaming HL-7 feed, and we need a good place to land that data as it comes in a raw format before we do any sorts of transformations into a new format that can support it. With these feeds, as they get large, as we go into larger hospital systems, we can actually be working with a mass amount of data that’s coming in. This dataset can grow to many terabytes, petabytes of size really rapidly. With Delta Lake, as a high performance file format, it provides an asset consistency layer that is actually specifically designed to enable streaming use cases. While you might have otherwise had to use a database system that is tailored to work well with streaming data but that is tailored to work well with smaller amounts of data, Delta Lake allows you to stream this data into high performance and low cost near aligned cloud storage where you’re able to work easily using common ETL, data engineering and data science libraries like Spark. It just really reduces the barrier to working with these complex streaming data sources in a real time and near real time environment.
Mike Ortega:
Ankur, with the KenSci application portfolio sitting on top of Databricks, how much of that application works out of the box with little to no modification versus having the healthcare system having to redevelop their apps using their tools? What’s out of the box versus having to redo it?
Ankur Teredesai:
Another fantastic question. Healthcare is so nuanced. We have all spent decades trying to integrate data and understand the nuances that come from the installation of EMR and the custom feeds. What we have found is, instead of doing a left to right journey where you go from trying to transform all the data coming from the HR plus scheduling plus billing systems and then trying to build semantic layers on top of that for machine learning, instead if you approach it as a middle to left and middle to right problem where you first solve the problem in the small, figure out what are the main feature transformations and attributes that are going to be important for machine learning and analytics for the insights, and then go transform those either manually or through accelerators, it leads to a much better out of the box experience. Of course, the last mile is always challenging and that customization does take a little bit of time, but particularly what used to take a health system like Integris six to eight months to even build one model and call it robust and validate it, KenSci has been able to take that entire workflow process in time down to around six weeks. That’s significant savings right there because of the out of box components.
On the outbound side, for end users such as discharge planning or hospital operations, KenSci provides customizable LEGO blocks of the dashboards if needed, but our play is really APIs and guaranteed SLAs that can then be tuned and customized for the installation setting, which we do either as an add-on service or if the out of the box components are sufficient, which they are for eight out of 10 cases, then we go with that. Level of customization, but really the value that we are bringing to the health system, as Julius pointed out, is this rapid acceleration due to the data pipeline and the ready built feature engineering and the model templates that are open for customization because of the notebook experience from Databricks and the Delta Lake technology, so that it’s much faster to do any type of customization that a customer partner may want.
Mike Ortega:
Awesome. I want to say a big thank you to our speakers. Great conversion around applying machine learning and scaled analytics in a healthcare environment.
Here’s more to explore
Free trial
Discover how Databricks’ open and unified data analytics platform performs large-scale data processing for batch and streaming workloads, simplifies and accelerates data science on large data sets, and standardizes the entire ML lifecycle.
Get Started
How to Bring Analytics and AI Into the Clinical Setting
Join this webinar with informatics leaders from UCLA Health and Databricks for an interactive discussion on how to accelerate data-driven transformation, enabling a patient-centric experience powered by AI.
Watch on-demand now
Analyzing Real World Evidence at Scale
Download this eBook to learn about the top analytics and machine learning use cases for real-world evidence, why legacy architectures lead to challenges in storing and analyzing clinical data at scale, and how health and life sciences companies have overcome these challenges with Databricks.
Download the eBook
Contact us for a personalized demo https://databricks.com/company/contact