Seattle Children’s is dedicated to providing the best medical care possible through strategies which include researchers and clinicians working alongside each other to improve our understanding of pediatric diseases. Full realization of this relationship requires systems and processes designed to enable the capture, discovery, and effective communication of knowledge and information. So how do we enable the translation of knowledge and expertise, generated by our scientists and clinicians, to improve patient care?
In this talk we will discuss how we are building a loosely coupled framework comprised of MLflow, Vega-lite, and other open source tools as part of our knowledge capture, management, and communication strategy. We will demonstrate how we leverage the MLFlow model registry to capture visualizations in a way that makes them discoverable and shareable to clinicians.
– Hi, we’re Andrew Bauman and James Hibbard and we are part of the Neuron Team at Seattle Children’s. And welcome to our presentation on translating models to medicine an example of managing visual communications.
During this presentation, we’ll be talking about the Neuron team and project, our mission, vision, aims and challenges. We’ll be discussing our solutions approach and we will demonstrate an example of managing and delivering a visual communication.
Team Neuron and Project Neuron is focused on supporting patient care in the ICU. Our vision is to achieve the optimal clinical outcome for every critically ill child. And our mission is to lead the development of personalized pediatric critical care by creation and refinement of predictive analytical and other decision support tools. And we do this by engaging multiple disciplines to foster rapid innovation, Promote and measure the adoption of our methods and provide novel training and their use and interpretation.
One of our specific aims is to enable Neuron’s mission by delivering insights and information to our users in a manner that reduces their cognitive load. So if you look to the representation on the right, you’ll see our user in the green circles surrounded by devices.
In this case,
this would be a neurologist in the ICU, who might be consulting with other physicians regarding a patient’s care. So within that discussion, they might need pieces of information that can help inform their decisions. And the forum with display layer on end. It’s accessed through idea devices. Consist of information like a visualization that compares treatments directory that they belong to, what medicines they’re on what proportions when administered, the underlying data that informs visualization as well as, for example positions associated with fluid balance management and maybe risk scores, such as sepsis risk forecast, respiratory distress risk forecast, or risk of crash forecast.
So for the example
just the story that I just gave, that’s an example really of primarily within domain communication. So your knowledge barriers within that clinical airspace are gonna be a lot less than say between the clinical care space and the engineering or data science space. So that’s largely what comprises our challenges as a team and also as an organization and probably this is not unique to our team or organization. So tech challenges are not insignificant.
Things like handling volume of velocity of data, or supporting data scientists with a platform and maintaining those tools and platforms are not nearly as significant as the knowledge silos that can exist between domains. So this comes up particularly in trying to transfer from the clinical domain into operationalizing that knowledge via engineering. And so without overcoming those knowledge silos, you’re not able to make full use of any of the technical challenges that you have overcome so you can’t fully realize their value.
So, examples of these knowledge silos or knowledge barriers might consist of clinical models or other types of models that are trapped in the literature and SME’s heads. And this may come in the form of, also come in the form of model code implementation, and ETL, which are tightly coupled, and also in tribal knowledge surrounding critical information and artifacts. So, these are fairly surmountable within a domain because people from the same domain often speak the same language and they can work together to overcome some of these challenges, but when you’re trying to go between two domains or multiple domains like we are you really have to make an effort to overcome those barriers so you can fully realize the value of the knowledge base that you have.
And and so let’s this is essentially what whis slide is saying is in order to fully enable our team and to fully realize our value, we really need to work on processes and technologies that help us go from a state of between domain knowledge silos into a state where we can all freely communicate with each other and share each other’s knowledge.
So one of the ways that we do this is by implementation is by using classic design patterns to implement a framework that consists of loosely coupled layers and entry points that play specific roles. So I discussed the display layer, which would deliver visualization visual communications. There’s a layer, the framework has a layer to communicate or coordinate the other layers. A very important part of the layer is the model layer, which helps us track model experiments and performance, registers models or visualizations to a catalog or registry and helps orchestrate deployment. There’s also classic components such as a middle data layer, which helps us extract source data and perform transformations needed for specific use cases and a layer which helps data scientists and data engineers contribute knowledge and information into the framework via the model layer and the middle data layers.
So how do we use this framework? We use the framework to capture models, artifacts and domain and tribal knowledge and tend to transfer that to standard transparent, traceable workflows. As an example that would be mapping concepts from a clinician to visuals that help inform their decisions and reduce their cognitive load when trying to process information about a patient. We then turn around and deliver this as discoverable, extensible and portable and products.
And here, these are some of the technologies that we use to implement this framework. Our model layer consists of largely of MLFlow with some of our own code in the form of BookKeeper. Our middle data layer is primarily comprised of Spark, Delta Lake and Blob store. And our display layer involves a number of elements, particularly in our prototype layers. These include Tableau, Altair and Vega-lite, Power BI, Streamlet, and React JS What process?
We follow and are continuing to develop a process which leads us from research to production and that includes defining and documenting information found on literature and gained through subject matter expert interviews, decoupling base models from the specifics of their implementation, providing support for further development and maturity. Say for example, a model that takes an instance of data and gives a risk score for that particular moment versus one that might forecast that risk score several hours into the future. And also, we focus on centralizing tracking of development, knowledge and communication to enable discovery. Again, our model layer plays a large part of that work. We deliver models visually, so we deliver these and communicate these models as visualizations, and examples of models that we’ve recently worked on our framework include a model for pediatric acute respiratory distress syndrome. So this is a scoring model, and predictive NEDOCS which is also a scoring model, which helps plan emergency department staffing. And this case, we actually use a machine learning algorithm to forecast our score two hours into the future.
As I mentioned, our model tracking layer is very important and is a large part of our focus. We use MLFlow for this tracking layer but we’ve also wrapped it in our own Python package which we currently call BookKeeper. So we do that so that we can easily customize MLFlow for our use cases, and also are able to extend that via the Python plugin system. So that we can meet our use cases without having to actually change MLFlow itself. And some of our custimizations and extensions include UI customization, custom models and workflows, such as being able to register visualizations and the model registry, and high level search and deploy convenience functionality, which is built on top of the MLFlow model API. And in particular, helps us build things for front end use, so that you can search out specific types of models and other information from the registry.
Neuron has been really focused on respiratory health. So that includes pARDS risk monitoring, as I mentioned before monitoring ventilation use, and even aspects of fluid management.
We’re currently capturing the pARDS pipeline using the process which I described previously. And we can that work right now. So in order to demonstrate our process and manage our visual communications, we have a hypothetical scenario of mapping ICU beds. And we’ll show you some aspects of how we’ve encoded visualizations, specifications in JSON. How we capture that with MLFlow and render them with Vega-lite, how we simplify visual communication management by using the MLFlow model registry.
Our discovery and deployment of models using custom high level search and deploy classes and then also how we actually pass data via fit or predict functions to our deployments back in order to render it under those visualizations.
And now, James will take over and walk through our scenario. – So we have a subject matter expert who wants to have a visualization that conveys the number of ICU beds available in the area and their availability for new patients. So they want to display this as dashboard somewhere where others can look at it and very quickly capture information about capacity and make decisions based off of that. The subject matter expert knows a whole lot of information about hospital census, hospital beds census and how that will affect care, but they can’t convey that on their own. So the first thing that they would do is they go to our registry, and they just do a brain dump. They would create a new experiment around the ICU bed communication, and they would describe the scenario of conveying these ICU beds, and how many are available, and then they might dump additional information. I used the term hospital census that would be domain specific jargon for saying, how many beds exists and how many are available. So terms like that might be defined here, or at least be available for other people to come through from other domains and say, “I’m not sure what you’re trying to convey here. “Can you define it?” And so they might do additional steps like attaching research papers to define terms or sources.
So once the initial capturing of the scenario has occurred, another team member would make an initial visualization of the hospital beds and register that visualization as a model.
So here we see in our model registry, that we have three versions of this hospital beds census tracked as three different versions of our model.
So each of these versions is a progressive sharpening, progressive improvement of communicating this piece of information from one of those knowledge silos to another in a way that is more broadly usable by the organization to support our patient care. So, maybe version one, the experts and the visualization have just enough knowledge, just enough understanding of the scenario to have a map of the geographic region with dots represent the hospitals. And we can look at that and say, that’s a great first step, but it’d be a lot more digestible if you map the size of the dots to the number of ICU beds. If you encoded availability by having a color encoding and so then the person who works on visualization we’ll go and update the model for mapping that chunk of data to a visual. And so that process will go back and forth. Maybe the version three here is enhancing that color encoding to better communicate that availability.
And we’re able to do this rapid iteration and this rapid mapping of data and concepts to a visual communication, because of how we have packaged up these visualizations as models. So this is part of our BookKeeper extension. You’ll see here that we have registered a specification for generating this visualization as a model itself. And we’ve highlighted here that the data that’s passed in is choosable by the user. Here we have a placeholder value and we’re able to define what the visual look like here as a specification and render it. So just like you have a model that takes data it in and predict some value, we have a visual communication as a model that takes in some data, and when you call predict on it, puts out a visual.
And what this allows us to do is we use our toolkit that we’ve developed and others have developed around creating models in the data science space and apply that to these communications. So we can track our visualizations, we can search them with different criteria, the same way that we might search through an MLFlow model registry for models that meet certain criteria or that answer the problem that we’re posing. And we’re able to select a visual representation, and then independently pass in a chunk of data that we want to interpret.
So here we have an example of a user pulling a visual model, and passing in some hospital data, and obviously know dots representing ICU beds are rendered. This is just to show that the data and the visual spec are independent and that you can pass in any data, that’s our visualization spec really encodes how that information will be turned into visual. So here we see they passed in some hospital bed information and they actually get a visual representation. And this visual representation carries along some of the intuition that subject matter expert had.
Larger size to have more ICU beds, location around the region they cared about.
And so as these visuals get improved and iterated on, we’re able to use the same model deployment procedure that we use for machine learning or statistical models to deploy our visual models. So here we can go from production stage to our version two of our model to version three, and all downstream systems will pick up on that change and display the current best visual representation. And so that’s great because it has one point of control for updating these mappings and distributing them across organization.
So here we have sort of kind of our version three of our ICU bed mapping.
We’ve accomplished quite a lot here. We’ve made a visual spec that’s independent of the data that we passed in, and that spec is a model for translating raw data to visualization that carries a lot of our subject matter experts like intuition about the system.
We’re not trying to convey all information that we have about hospital beds and their location hospitals that they’re in, we’re trying to communicate the most important pieces of information. So it’s really honed to a certain message and to allow a non expert to pick up on that message. We have a tooltip here where we can allow user to start to investigate more detail pieces of information about hospital beds and hospitals that we’ve highlighted with the visual.
You’re never going to have a non expert meet 100% of the same understanding as an expert, but this allows a path to allow 60%, 70% of that kind of initial intuition and that allows people from other knowledge silos to have a richer discussion about the system and hopefully contribute information that they have in their background more easily.
And that kind of leads us to our sort of cheeky term we’ve come up with for transfer vizzing So it’s analogous to transfer learning so we’re really streaming that visualization as model analogy as much as possible. Here we see that we have a visual encoding that is great for communicating ICU bed census information. The number of ICU beds, capacity and we have updated the spec
of the visual to not just display the geographic region of Washington State, but the whole US. So here we are transferring our intuition about hospital beds, and their availability from Washington state to the entire country. And this is great because a non expert would be able to make that substitution and hopefully have the same sort of intuitions an expert would have, but now within different contexts. And we’d like to thank everyone who made this work possible. The rest of the Neuron Team at Seattle Children’s, our friends at Databricks, and the support we have received from the Lyndsey Whittingham Foundation.
Andrew is a scientist with advanced degrees in chemistry and toxicology with research, analytics, and data science experience spanning hospital research organizations, pharma, and online education. He is currently an Advanced Analytics Product Engineer at Seattle Childrens, previously holding roles as a content developer (Udacity School of Autonomous Systems), Machine Learning Engineer (Healthslate and Entrée AI), and Analytical Team Leader (Bristol Myers Squibb).
James Hibbard is a data engineer with seven years of experience working within biomedical research. He is currently at Seattle Childrens where he is developing frameworks and methods for integrating medical records with multi-omics datasets to improve care.
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.