RAG Pipeline

Build a complete Retrieval-Augmented Generation (RAG) pipeline using Endee and LangChain.

What is RAG?

RAG combines retrieval and generation to answer questions using your own data:

Retrieve — Find relevant documents from your vector store
Augment — Add retrieved context to the prompt
Generate — Use an LLM to generate an answer based on the context

Setting Up the RAG Pipeline

Import Required Components


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

Initialize the LLM


# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo")

Create a Prompt Template

Design a prompt that incorporates retrieved context:


# Create a prompt template
prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based only on the provided context:
 
    Context: {context}
 
    Question: {question}
    """
)

Create the Retriever


# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 2})

Format Retrieved Documents

Create a helper function to format documents:


# Function to format documents
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

Build the RAG Chain

Combine all components into a single chain:


# Create the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Use the RAG Chain


# Use the RAG chain
question = "What are vector databases and how do they work?"
response = rag_chain.invoke(question)
 
print(f"Question: {question}")
print(f"\nResponse: {response}")

Complete RAG Example

Here’s the full working example:


from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
import os
import time
 
# Setup credentials
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
endee_api_token = "your-endee-api-token"
 
# Initialize embedding model
embedding_model = OpenAIEmbeddings()
 
# Create vector store
timestamp = int(time.time())
index_name = f"rag_demo_{timestamp}"
 
vector_store = EndeeVectorStore.from_params(
    embedding=embedding_model,
    api_token=endee_api_token,
    index_name=index_name,
    dimension=1536,
    space_type="cosine"
)
 
# Add documents
texts = [
    "Python is a high-level programming language known for readability.",
    "Machine learning enables systems to learn from data automatically.",
    "Vector databases store high-dimensional vectors for similarity search.",
    "Endee is a fast vector database for AI applications."
]
 
metadatas = [
    {"category": "programming"},
    {"category": "ai"},
    {"category": "database"},
    {"category": "database", "product": "endee"}
]
 
vector_store.add_texts(texts=texts, metadatas=metadatas)
 
# Build RAG pipeline
llm = ChatOpenAI(model="gpt-3.5-turbo")
 
prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based only on the provided context:
 
    Context: {context}
 
    Question: {question}
    """
)
 
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
 
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])
 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
 
# Query the RAG pipeline
question = "What is Endee and what is it used for?"
response = rag_chain.invoke(question)
 
print(f"Question: {question}")
print(f"Response: {response}")

RAG with Filtered Retrieval

Combine RAG with metadata filtering:


# Create a filtered retriever
filtered_retriever = vector_store.as_retriever(
    search_kwargs={
        "k": 3,
        "filter": [{"category": {"$eq": "database"}}]
    }
)
 
# Build RAG chain with filtered retriever
filtered_rag_chain = (
    {"context": filtered_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
 
# Query with filtered context
response = filtered_rag_chain.invoke("What are the best database options?")
print(response)

Advanced RAG Patterns

Custom Prompt Engineering


# More detailed prompt template
detailed_prompt = ChatPromptTemplate.from_template(
    """
    You are an expert assistant. Answer the question based on the context provided.
    If the context doesn't contain enough information, say so.
 
    Context:
    {context}
 
    Question: {question}
 
    Provide a clear, concise answer:
    """
)

Streaming Responses


# Stream responses for better UX
for chunk in rag_chain.stream("What is machine learning?"):
    print(chunk, end="", flush=True)

Next Steps

Delete documents and manage indexes