Research at Databricks and MosaicML
Where research meets the real world
You’re in good company
Apache Spark. Lakehouse. Mosaic MPT-7B. These systems were built by the special breed of engineers you’ll find here, where Databricks and MosaicML have joined forces.
Our leaders have a proven track record of making breakthroughs in distributed systems, generative AI, LLMs and data analytics.
Now we’re looking for PhDs who want to make an impact. If you’re truth-seeking, data-driven and work from first principles, this is the place for you.
Publications
Explore our recent papers in collaboration with UC Berkeley, Stanford and other leading universities
Showing 1 - 12 of 46 results
Showing 1 - 12 of 46 results
Build your career beyond academia
Wanted: PhDs skilled at building scalable, reliable and performant systems
At Databricks and MosaicML, we bring together experts in data analytics, deep learning, distributed systems and AI infrastructure. Together, we’re radically simplifying the entire data lifecycle on our open lakehouse architecture that unifies data, analytics and AI.
If this work excites you, we might have a spot for you on one of our specialized engineering and research teams.
Explore Databricks teams
Caching Team
Build the next-generation sharding, load balancing and caching solutions for Databricks to enable low latency, efficiency and scalability in our systems.
Photon Team
Build Databricks’ high-performance native (C++), vectorized SQL execution engine, which powers petabytes of query processing at Databricks per day.
Query Optimization Team
Build systems that optimize diverse workloads. Innovate with all variety of techniques — from traditional to ML — to outperform specialized data warehouses.
Lakestore Team
Build best-in-class storage systems with the usability and performance of data warehouses, and the flexibility and scalability of data lakes for all data workloads.
Explore MosaicML teams
Research Science
Drive ambitious research projects that:
- Push the limits of existing technology
- Explore new approaches that go beyond the state of the art
Survey publications and develop methods for efficient neural network training.
Engineering
Design and implement our ML infrastructure and generative AI platform. Establish development best practices. Help develop infrastructure and platforms that analyze ML training jobs, predict performance and cost, and run them across various hardware.
Our published researchers and engineers
Meet our employees behind our recent publications
Life as a Software Engineer After a Computer Systems PhD at Stanford
Hear Shoumik Palkar’s thoughts on creativity at work, validation of personal success and the peer/mentor network at Databricks.