Scaling RAG and Embedding Computations with Ray and Pinecone
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Generative AI |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | AI/Machine Learning, GenAI/LLMs |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Developing a retrieval augmented generation (RAG) based LLM application can be hard and data intensive. It requires many different components to work together - LLM, vector database, and embeddings of large amounts of data. The embedding step requires distributed reading of large amounts of raw data stored anywhere, chunking and tokenizing sentences into tokens, using an embedding model to generate embeddings, and inserting embeddings into the vector database. Here we introduce Ray Data - an open source distributed machine learning data processing library, and Pinecone - the industry-leading vector database. With Ray Data and Pinecone, you can easily generate one billion embeddings in under one day with a limited budget. In this session, we will cover the fundamentals of Ray Data and Pinecone and how a machine learning user can use it for RAG embedding computation.