Training MoEs at Scale with PyTorch and DatabricksJuly 1, 2024 by Brian Chu, Mihir Patel, Vitaliy Chiley and Evan Racah in Mosaic Research Mixture-of-Experts (MoE) has emerged as a promising LLM architecture for efficient training and inference. MoE models like DBRX , which use multiple expert...