Skip to main content

Each of us will likely generate millions of gigabytes of health data in our lifetimes: medical and pharmacy claims, electronic medical records with extensive clinical documentation, medical images; perhaps streaming data from wearable devices, blood biopsy data that can detect cancer, and genomic sequencing. These data sets have enormous potential to uncover new, life-saving treatments, predict disease before it happens, and fundamentally change the way that care is delivered.

For healthcare and life sciences organizations seeking to deliver better patient outcomes, legacy technology is most often the rate-limiting factor. Common challenges include:

  • Data silos and limited support for semi-structured data (like provider notes) and unstructured data (like images) prevent organizations from gaining a holistic view of a patient
  • Rapid growth in data is outpacing the scale of existing infrastructure, preventing population-level research
  • Batch processing and disjointed analytic tools prevent real-time response to challenges such as supply chain constraints and ICU bed capacity
  • Traditional data architectures that don’t support advanced analytics and AI use cases

Sadly, for these reasons, the opportunity to tap into AI-driven innovation is simply out of reach for most organizations on the front lines of developing new drugs and treating patients in need.

Meet Lakehouse for Healthcare and Life Sciences

Well, that’s changing! Today, we’re thrilled to introduce the Lakehouse for Healthcare and Life Sciences — a platform designed to help organizations collaborate with data and AI in service of a unified goal: improving health outcomes. The Lakehouse eliminates the need for legacy data architectures, which have inhibited innovation, by providing a simple, open and multi-cloud platform for all your data, analytics and AI workloads. Building on this foundation are solution accelerators developed by Databricks and our ecosystem of partners for high-value analytics and AI use cases such as disease prediction, medical image classification, and biomarker discovery.

We know that healthcare organizations face a unique, and often painful, set of challenges that can significantly hinder innovation. We’ve designed the Lakehouse to address these challenges and provide the following benefits:

  • Build a 360 degree view of the patient: It’s widely accepted that a vast majority of medical data is unstructured, which makes gaining a holistic patient view that much harder with siloed systems. This problem gets exponential as healthcare becomes increasingly interconnected between healthcare providers, payers and pharma manufacturers. Lakehouse is open by design and supports all data types, enabling organizations to create a 360 degree view of patient health. Couple this with the pre-built data ingestion and curation solution accelerators to bring health data to your lakehouse, and it’s even easier.
  • Scale analytics for population-level insights: Scale is critical for initiatives like population health analytics and drug discovery, but for years legacy technology has failed to keep up with ballooning health data like genomics and imaging. Built in the cloud and designed for performance, the Lakehouse supports the largest of data jobs at lightning-fast speeds. For example, Regeneron reduced data processing from 3 weeks to 5 hours, and genotype- phenotype queries from 30 minutes to 3 seconds for workloads that scaled to 1.5M exomes. With the Lakeouse, organizations can quickly and reliably analyze data for millions of patients.
  • Deliver real-time care and operations: Healthcare happens in real-time and requires real-time insights for critical use cases from managing ICU capacity to monitoring the distribution of temperature-sensitive vaccines. Unfortunately, traditional data warehouses aren’t designed to operate in real-time. The Lakehouse enables real-time analysis on streaming data so organizations can deliver care when it's needed, not after the fact.
  • Leverage predictive health insights: The future of healthcare is predictive, not descriptive. The Lakehouse provides a robust set of analytics and AI tools directly connected to your data so organizations can innovate drug discovery and patient care with machine learning. Additionally, our network of partners has built accelerators for high-value analytics and AI use cases, including drug targeting and repurposing, drug safety monitoring, disease prediction and digital pathology analysis for cancer detection.

With these capabilities, Databricks is empowering a new breed of data and AI innovators in healthcare:

Using AI to develop diagnostic and therapeutic products that help children living with behavioral conditions Applied machine learning to 17M+ electronic health records to identify new treatment indications for approved therapies. Delivering recommendations to patients using streaming data from connected health wearables for diabetes management.

Tailor-made Solutions for Healthcare & Life Sciences

To help organizations realize value from their Lakehouse projects faster, Databricks and our ecosystem of partners have developed solution accelerators and open-source libraries—like Glow for genomics and Smolder for HL7v2 messages—to address common industry use cases.

  • Data Ingestion and Curation Tools - easily ingest structured and unstructured health data (e.g. FHIR/HL7v2, imaging, genomics) into your Lakehouse for analytics at scale with our templates for data ingestion and curation.
  • Analytics and AI Templates - packaged solutions for high-value analytics and AI use cases such as drug target identification, drug repurposing, disease risk prediction, medical image analytics (e.g. detecting cancer in pathology images) and more.

Featured partner solutions

Intelligent Drug Repurposing Interoperability Natural Language Processing for Healthcare Biomedical Research Intelligent Data Management
Identify new therapeutic uses for existing drugs with the power of data and machine learning. Automate the ingestion of streaming FHIR bundles into your lakehouse and standardize with OMOP for patient analytics at scale. Extract insights from unstructured medical text for use cases such as automated PHI removal, adverse event detection, and oncology research. Improve biomarker discovery for precision medicine with a highly scalable and extensible whole-genome processing solution.

Check out our full set of solutions on our Lakehouse for Healthcare and Life Sciences page.

Get started building your Lakehouse

You have the data. Now you have the platform. Join the hundreds of healthcare and life sciences organizations innovating on the Lakehouse. Here are some resources to help you get started:

Try Databricks for free
See all News posts