Skip to main content
CUSTOMER STORY

Empowering the next billion software creators

3

Weeks to launch
code generation LLM

Multi-billion

Parameter code generation
model developed

SOLUTION: Model Training
PLATFORM USE CASE: Mosaic AI

Software startup Replit has developed a platform for developers that uses the power of GenAI to reduce the time it takes to develop and deploy software by orders of magnitude. Replit has enabled millions of software developers around the world to automate routine tasks and enhance code quality, helping speed the journey from idea to product while fostering a more inclusive development community. Although Replit’s platform has multiple features for developers, one of the most popular is its AI suite, which completes, explains and transforms code. Replit used Mosaic AI Training to build its own code completion model from scratch in less than a week, enabling it to meet an accelerated product launch timeline and get to market on time with fewer engineering headaches. With Databricks, they have massively increased the productivity of their AI engineers, resulting in faster time to market for new models that are driving significant business impact.

Lowering the barrier to entry for software engineers

As more organizations transition to a digital-first approach to meet customer needs, the demand for software engineers is unabated. In fact, software engineers top the list of 100 best jobs for 2023. Of the approximately 27 million software developers in the world today, Replit hosts nearly that amount of users on their platform, with a current user base of 25 million. Demand for software developers continues to expand, but educational barriers, diversity and inclusion challenges, and rapid technological advancements often prevent potential developers from entering or staying in the field.

The vision for the Replit platform is to provide coding beginners with an intuitive integrated development environment (IDE) that allows users to start coding in various languages without the need to install software. As experienced AI engineers, Replit knew they needed to train their own specialized model for code completion in order to build this platform. The Replit team recognized the value of a model training environment that would allow them to focus exclusively on key components such as data pipelines and custom vocabulary. They also knew the challenges of assembling an end-to-end training infrastructure. To help them scale their platform and meet their goal of reaching the next billion software creators, they sought out the Mosaic AI team at Databricks.

Having previously worked on LLM training at Google, Michele Catasta, VP of AI at Replit, understood the value of an AI training platform that could abstract away hardware complexity and allow his team to concentrate on other aspects of model building. He explained, “Rather than taking care of the whole infrastructure from end-to-end, which is extremely hard to build, we were looking for a tool that could help us differentiate our offering, like fine-tuning on Replit code. We needed the best white glove experience we could find on the market.”

Democratizing software development with generative AI

The Mosaic AI Training infrastructure from Databricks simplifies the process of large model training so organizations can securely and cost-effectively pretrain and fine-tune their own AI models with ease. Before working with Databricks, Replit explored various options but found them to be either underdeveloped or overly complex. Catasta added, “While I was aware of the alternatives out there, all of them were lagging behind what we needed or they exposed a much higher level of complexity compared to what our very small team was capable of dealing with.”

Replit leveraged the Mosaic AI Training infrastructure and tools to experiment with smaller models, gradually scaling up to a larger allocation of 256 GPUs just a week before the software company launched its code completion feature. Thanks to this compute availability, Replit successfully conducted a “YOLO” run of its LLM and launched its code completion model in time for its developer day.

Mosaic AI Training provided easy-to-use tooling for Replit to train and refine their model and scale their engineering efforts, achieving significant milestones with a relatively small team. The journey from the initial version of the model to production demonstrated a continuous cycle of learning, adapting and optimizing — all made possible through the support of Databricks.

Scaling engineering ops with easily trainable models

Replit's collaboration with the Databricks Mosaic AI team has enabled them to build and augment powerful generative AI solutions, significantly enhancing their product offerings for the software developer community. The streamlined process facilitated a smoother, more reliable development cycle. This accelerated innovation while lowering Replit's total cost of ownership (TCO) for deploying their code generation LLM with speed, reliability and full governance.  

Guided by their prior experiences and the robust support from Databricks and Mosaic, the team was able to confidently navigate through their experiments — critical in managing the complexities and volatility associated with training LLMs. Catasta clarified, “What Mosaic AI Training allowed me to accomplish in a relatively short amount of time is probably my best career accomplishment to date. I was able to put code completion in front of every user. Now, I strongly believe that we can change how people develop software.”

Working with Databricks has proven to be transformative for Replit, improving the user experience and satisfaction of their platform. The rapid development and deployment of their code completion feature — despite the small size of the team — underscores the productivity gains Replit achieved through this multilayered partnership. This collaboration has enhanced Replit’s technical capabilities and reinforced its mission to offer user-friendly tools that help developers build software collaboratively with the power of AI.

Learn more: https://blog.replit.com/llm-training
Find the latest version of Replit’s code completion model here: https://blog.replit.com/ai4all