SESSION

Scaling RAG and Embedding Computations with Ray and Pinecone

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKGenerative AI
INDUSTRYEnterprise Technology
TECHNOLOGIESAI/Machine Learning, GenAI/LLMs
SKILL LEVELIntermediate
DURATION40

Developing a retrieval augmented generation (RAG) based LLM application can be hard and data intensive. It requires many different components to work together - LLM, vector database, and embeddings of large amounts of data. The embedding step requires distributed reading of large amounts of raw data stored anywhere, chunking and tokenizing sentences into tokens, using an embedding model to generate embeddings, and inserting embeddings into the vector database. Here we introduce Ray Data - an open source distributed machine learning data processing library, and Pinecone - the industry-leading vector database. With Ray Data and Pinecone, you can easily generate one billion embeddings in under one day with a limited budget. In this session, we will cover the fundamentals of Ray Data and Pinecone and how a machine learning user can use it for RAG embedding computation.

SESSION SPEAKERS

Cheng Su

/Manager of Data Team
Anyscale

Roy Miara

/Engineering Manager, Generative Search
Pinecone