RAG Pipeline
Build a complete Retrieval-Augmented Generation (RAG) pipeline using Endee and LangChain.
What is RAG?
RAG combines retrieval and generation to answer questions using your own data:
- Retrieve — Find relevant documents from your vector store
- Augment — Add retrieved context to the prompt
- Generate — Use an LLM to generate an answer based on the context
Setting Up the RAG Pipeline
Import Required Components
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAIInitialize the LLM
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo")Create a Prompt Template
Design a prompt that incorporates retrieved context:
# Create a prompt template
prompt = ChatPromptTemplate.from_template(
"""
Answer the following question based only on the provided context:
Context: {context}
Question: {question}
"""
)Create the Retriever
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_kwargs={"k": 2})Format Retrieved Documents
Create a helper function to format documents:
# Function to format documents
def format_docs(docs):
return "\n\n".join([doc.page_content for doc in docs])Build the RAG Chain
Combine all components into a single chain:
# Create the RAG chain
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)Use the RAG Chain
# Use the RAG chain
question = "What are vector databases and how do they work?"
response = rag_chain.invoke(question)
print(f"Question: {question}")
print(f"\nResponse: {response}")Complete RAG Example
Here’s the full working example:
from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
import os
import time
# Setup credentials
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
endee_api_token = "your-endee-api-token"
# Initialize embedding model
embedding_model = OpenAIEmbeddings()
# Create vector store
timestamp = int(time.time())
index_name = f"rag_demo_{timestamp}"
vector_store = EndeeVectorStore.from_params(
embedding=embedding_model,
api_token=endee_api_token,
index_name=index_name,
dimension=1536,
space_type="cosine"
)
# Add documents
texts = [
"Python is a high-level programming language known for readability.",
"Machine learning enables systems to learn from data automatically.",
"Vector databases store high-dimensional vectors for similarity search.",
"Endee is a fast vector database for AI applications."
]
metadatas = [
{"category": "programming"},
{"category": "ai"},
{"category": "database"},
{"category": "database", "product": "endee"}
]
vector_store.add_texts(texts=texts, metadatas=metadatas)
# Build RAG pipeline
llm = ChatOpenAI(model="gpt-3.5-turbo")
prompt = ChatPromptTemplate.from_template(
"""
Answer the following question based only on the provided context:
Context: {context}
Question: {question}
"""
)
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
def format_docs(docs):
return "\n\n".join([doc.page_content for doc in docs])
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Query the RAG pipeline
question = "What is Endee and what is it used for?"
response = rag_chain.invoke(question)
print(f"Question: {question}")
print(f"Response: {response}")RAG with Filtered Retrieval
Combine RAG with metadata filtering:
# Create a filtered retriever
filtered_retriever = vector_store.as_retriever(
search_kwargs={
"k": 3,
"filter": [{"category": {"$eq": "database"}}]
}
)
# Build RAG chain with filtered retriever
filtered_rag_chain = (
{"context": filtered_retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Query with filtered context
response = filtered_rag_chain.invoke("What are the best database options?")
print(response)Advanced RAG Patterns
Custom Prompt Engineering
# More detailed prompt template
detailed_prompt = ChatPromptTemplate.from_template(
"""
You are an expert assistant. Answer the question based on the context provided.
If the context doesn't contain enough information, say so.
Context:
{context}
Question: {question}
Provide a clear, concise answer:
"""
)Streaming Responses
# Stream responses for better UX
for chunk in rag_chain.stream("What is machine learning?"):
print(chunk, end="", flush=True)