Skip to Content

RAG Pipeline

Build a complete Retrieval-Augmented Generation (RAG) pipeline using Endee and LangChain.

What is RAG?

RAG combines retrieval and generation to answer questions using your own data:

  1. Retrieve — Find relevant documents from your vector store
  2. Augment — Add retrieved context to the prompt
  3. Generate — Use an LLM to generate an answer based on the context

Setting Up the RAG Pipeline

Import Required Components

from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI

Initialize the LLM

# Initialize LLM llm = ChatOpenAI(model="gpt-3.5-turbo")

Create a Prompt Template

Design a prompt that incorporates retrieved context:

# Create a prompt template prompt = ChatPromptTemplate.from_template( """ Answer the following question based only on the provided context: Context: {context} Question: {question} """ )

Create the Retriever

# Create a retriever from the vector store retriever = vector_store.as_retriever(search_kwargs={"k": 2})

Format Retrieved Documents

Create a helper function to format documents:

# Function to format documents def format_docs(docs): return "\n\n".join([doc.page_content for doc in docs])

Build the RAG Chain

Combine all components into a single chain:

# Create the RAG chain rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )

Use the RAG Chain

# Use the RAG chain question = "What are vector databases and how do they work?" response = rag_chain.invoke(question) print(f"Question: {question}") print(f"\nResponse: {response}")

Complete RAG Example

Here’s the full working example:

from endee_langchain import EndeeVectorStore from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough import os import time # Setup credentials os.environ["OPENAI_API_KEY"] = "your-openai-api-key" endee_api_token = "your-endee-api-token" # Initialize embedding model embedding_model = OpenAIEmbeddings() # Create vector store timestamp = int(time.time()) index_name = f"rag_demo_{timestamp}" vector_store = EndeeVectorStore.from_params( embedding=embedding_model, api_token=endee_api_token, index_name=index_name, dimension=1536, space_type="cosine" ) # Add documents texts = [ "Python is a high-level programming language known for readability.", "Machine learning enables systems to learn from data automatically.", "Vector databases store high-dimensional vectors for similarity search.", "Endee is a fast vector database for AI applications." ] metadatas = [ {"category": "programming"}, {"category": "ai"}, {"category": "database"}, {"category": "database", "product": "endee"} ] vector_store.add_texts(texts=texts, metadatas=metadatas) # Build RAG pipeline llm = ChatOpenAI(model="gpt-3.5-turbo") prompt = ChatPromptTemplate.from_template( """ Answer the following question based only on the provided context: Context: {context} Question: {question} """ ) retriever = vector_store.as_retriever(search_kwargs={"k": 2}) def format_docs(docs): return "\n\n".join([doc.page_content for doc in docs]) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Query the RAG pipeline question = "What is Endee and what is it used for?" response = rag_chain.invoke(question) print(f"Question: {question}") print(f"Response: {response}")

RAG with Filtered Retrieval

Combine RAG with metadata filtering:

# Create a filtered retriever filtered_retriever = vector_store.as_retriever( search_kwargs={ "k": 3, "filter": [{"category": {"$eq": "database"}}] } ) # Build RAG chain with filtered retriever filtered_rag_chain = ( {"context": filtered_retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Query with filtered context response = filtered_rag_chain.invoke("What are the best database options?") print(response)

Advanced RAG Patterns

Custom Prompt Engineering

# More detailed prompt template detailed_prompt = ChatPromptTemplate.from_template( """ You are an expert assistant. Answer the question based on the context provided. If the context doesn't contain enough information, say so. Context: {context} Question: {question} Provide a clear, concise answer: """ )

Streaming Responses

# Stream responses for better UX for chunk in rag_chain.stream("What is machine learning?"): print(chunk, end="", flush=True)

Next Steps