Skip to main content

Research at Databricks and MosaicML

Where research meets the real world

Join our team

You’re in good company

Apache Spark. Lakehouse. Mosaic MPT-7B. These systems were built by the special breed of engineers you’ll find here, where Databricks and MosaicML have joined forces.

Our leaders have a proven track record of making breakthroughs in distributed systems, generative AI, LLMs and data analytics.

Now we’re looking for PhDs who want to make an impact. If you’re truth-seeking, data-driven and work from first principles, this is the place for you.

Publications

Explore our recent papers in collaboration with UC Berkeley, Stanford and other leading universities

Showing 1 - 12 of 46 results

Applications

A Cloud-Compatible Bioinformatics Pipeline for Ultrarapid Pathogen Identification From Next-Generation Sequencing of Clinical Samples

Samia N. Naccache, Scot Federman, Narayanan Veeeraraghavan, Matei Zaharia, Deanna Lee, Erik Samayoa, Jerome Bouquet, Alexander L. Greninger, Ka-Cheung Luk, Barryett Enge, Debra A. Wadford, Sharon L. Messenger, Gillian L. Genrich, Kristen Pellegrino, Gilda Grard, Eric Leroy, Bradley S. Schneider, Joseph N. Fair, Miguel A. Martínez, Pavel Isa, John A. Crump, Joseph L. DeRisi, Taylor Sittler, John Hackett, Jr., Steve Miller, Charles Y. Chiu

Get the PDF
Distributed Systems

Above the Clouds: A View of Cloud Computing

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, Matei Zaharia

Get the PDF
AI and ML

Accelerating the Machine Learning Lifecycle With MLflow

Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue , Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, Corey Zumar, Databricks Inc.

Get the PDF
Applications

ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing

Matt Massie, Frank Nothaft, Christopher Hartl, Christos Kozanitis, André Schumacher, Anthony D. Joseph, David A. Patterson

Get the PDF
Distributed Systems

Apache Spark: A Unified Engine for Big Data Processing

Matei Zaharia, Reynold Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, Ion Stoica

Get the PDF
Distributed Systems

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin,, Shivaram Venkataraman, Vladimir Braverman, Ion Stoica

Get the PDF
Applications

C3: Internet-Scale Control Plane for Video Quality Optimization

Aditya Ganjam, Junchen Jiang, Xi Liu, Vyas Sekar, Faisal Siddiqui, Ion Stoica, Jibin Zhan, Hui Zhang

Get the PDF
Applications

CellIQ : Real-Time Cellular Network Analytics at Scale

Anand Padmanabha Iyer, Li Erran Li, Ion Stoica

Get the PDF
Distributed Systems

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications

D. Karger, H. Balakrishnan, I. Stoica, M.F. Kaashoek, R. Morris

Get the PDF
AI and ML

Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica

Get the PDF
AI and ML

Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities

Brian R. Bartoldson, Bhavya Kailkhura, Davis Blalock

Get the PDF
AI and ML

DAWNBench: An End-to-End Deep Learning Benchmark and Competition

Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia

Get the PDF

Showing 1 - 12 of 46 results

Build your career

Build your career beyond academia

Wanted: PhDs skilled at building scalable, reliable and performant systems

At Databricks and MosaicML, we bring together experts in data analytics, deep learning, distributed systems and AI infrastructure. Together, we’re radically simplifying the entire data lifecycle on our open lakehouse architecture that unifies data, analytics and AI.

If this work excites you, we might have a spot for you on one of our specialized engineering and research teams.

Explore Databricks teams

Caching Team

Build the next-generation sharding, load balancing and caching solutions for Databricks to enable low latency, efficiency and scalability in our systems.

Photon Team

Build Databricks’ high-performance native (C++), vectorized SQL execution engine, which powers petabytes of query processing at Databricks per day.

Query Optimization Team

Build systems that optimize diverse workloads. Innovate with all variety of techniques — from traditional to ML — to outperform specialized data warehouses.

Lakestore Team

Build best-in-class storage systems with the usability and performance of data warehouses, and the flexibility and scalability of data lakes for all data workloads.

Explore MosaicML teams

Research Science

Drive ambitious research projects that:

  • Push the limits of existing technology 
  • Explore new approaches that go beyond the state of the art

Survey publications and develop methods for efficient neural network training.

Engineering

Design and implement our ML infrastructure and generative AI platform. Establish development best practices. Help develop infrastructure and platforms that analyze ML training jobs, predict performance and cost, and run them across various hardware.

Our published researchers and engineers

Meet our employees behind our recent publications


Life as a Software Engineer After a Computer Systems PhD at Stanford

Hear Shoumik Palkar’s thoughts on creativity at work, validation of personal success and the peer/mentor network at Databricks.

Read article