LangChain Integration with Endee
LangChain is a framework for building LLM-powered applications. This integration uses Endee as a LangChain vector store — a drop-in replacement for any VectorStore-compatible backend. You get Endee’s hybrid search, metadata filtering, and persistent indexes while keeping the full LangChain interface: similarity_search(), as_retriever(), LCEL chains, and agents all work without modification.
Install Dependencies
# Core dependencies
pip install langchain-endee endee endee-model
# Pick an embedding model:
# Option A: Local (no API key)
pip install langchain-huggingface sentence-transformers
# Option B: OpenAI
# pip install langchain-openai
# Optional: SPLADE sparse embeddings for hybrid search
# pip install fastembedConnect to Endee
Serverless: get a token from app.endee.io . Local: run Endee locally — no token needed (GitHub ).
See Quick Start for setup details.
import os
from langchain_core.documents import Document
from endee import Endee, Precision
from langchain_endee import (
EndeeVectorStore,
RetrievalMode,
EndeeModelSparse, # native BM25 (server-side IDF via endee-model)
FastEmbedSparse, # SPLADE / BM25 via fastembed (optional)
)
ENDEE_TOKEN = os.environ.get("ENDEE_API_TOKEN", "")Running on Google Colab? Use the Secrets tab (key icon in left sidebar) to store ENDEE_API_TOKEN. The notebook auto-detects Colab and reads secrets via google.colab.userdata.
Choose an Embedding Model
LangChain’s Embeddings interface provides a standard way to convert text into vectors. Any class implementing embed_documents() and embed_query() works — pick one below:
| Option | Model | Dimension | Needs API key? |
|---|---|---|---|
| A (local) | all-MiniLM-L6-v2 | 384 | No |
| B (cloud) | text-embedding-3-small | 1536 | Yes (OPENAI_API_KEY) |
# Option A — runs locally, no API key
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
DIMENSION = 384# Option B — OpenAI (uncomment to use)
# from langchain_openai import OpenAIEmbeddings
# embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# DIMENSION = 1536Create the Vector Store
LangChain’s VectorStore base class provides factory methods for index creation and ingestion. from_documents() embeds and upserts all documents in a single call.
INDEX = "rag_demo"
vector_store = EndeeVectorStore.from_documents(
documents=documents,
embedding=embeddings,
index_name=INDEX,
api_token=ENDEE_TOKEN,
dimension=DIMENSION,
space_type="cosine",
precision=Precision.INT16,
force_recreate=True,
)Alternative: from_texts() — pass raw strings and metadata lists instead of Document objects.
vector_store = EndeeVectorStore.from_texts(
texts=["Python is great", "Rust is fast"],
metadatas=[{"language": "python"}, {"language": "rust"}],
embedding=embeddings,
index_name="my_index",
api_token=ENDEE_TOKEN,
dimension=DIMENSION,
)Search Methods
LangChain’s VectorStore defines four standard search methods. Each embeds the query (or accepts a pre-computed vector), runs approximate nearest-neighbour search, and returns Document objects:
| Method | Input | Returns |
|---|---|---|
similarity_search() | query string | list[Document] |
similarity_search_with_score() | query string | list[tuple[Document, float]] |
similarity_search_by_vector() | embedding vector | list[Document] |
similarity_search_by_vector_with_score() | embedding vector | list[tuple[Document, float]] |
Similarity Search
results = vector_store.similarity_search(query="How does RAG work?", k=3)
for doc in results:
print(f" [{doc.metadata.get('topic')}] {doc.page_content[:70]}")Similarity Search with Score
scored = vector_store.similarity_search_with_score(query="neural networks", k=3)
for doc, score in scored:
print(f" sim={score:.3f} {doc.page_content[:60]}")Similarity Search by Vector
query_vec = embeddings.embed_query("programming language safety")
results = vector_store.similarity_search_by_vector(embedding=query_vec, k=2)Similarity Search by Vector with Score
scored_by_vec = vector_store.similarity_search_by_vector_with_score(
embedding=query_vec,
k=3,
filter=[{"topic": {"$eq": "programming"}}],
)
for doc, score in scored_by_vec:
print(f" sim={score:.3f} {doc.page_content[:65]}")Metadata Filters
All LangChain VectorStore search methods accept a filter parameter to narrow results by metadata. Filters are passed as a list of dicts (AND logic).
See Filtering for supported filter operators ($eq, $in, $range).
ai_docs = vector_store.similarity_search(
query="learning from data",
k=5,
filter=[{"topic": {"$eq": "ai"}}],
)lang_docs = vector_store.similarity_search(
query="memory safety",
k=5,
filter=[{"language": {"$in": ["python", "rust"]}}],
)Search Tuning Parameters
See Filtering for details on ef, prefilter_cardinality_threshold, and filter_boost_percentage.
advanced = vector_store.similarity_search_with_score(
query="vector search algorithms",
k=3,
ef=256,
filter=[{"topic": {"$eq": "database"}}],
prefilter_cardinality_threshold=5_000,
filter_boost_percentage=20,
include_vectors=False,
)CRUD Operations
LangChain’s VectorStore interface provides methods for managing documents after initial ingestion: add_texts() to insert, get_by_ids() to fetch, and delete() to remove. Endee also supports update_filters() to modify metadata without re-embedding.
Add Texts
new_ids = vector_store.add_texts(
texts=[
"Go is a statically typed language designed at Google for scalable services.",
"TypeScript adds static typing to JavaScript for safer large codebases.",
],
metadatas=[
{"topic": "programming", "language": "go"},
{"topic": "programming", "language": "typescript"},
],
batch_size=1000,
embedding_chunk_size=100,
)Get by IDs
fetched = vector_store.get_by_ids(new_ids)
for doc in fetched:
print(f" [{doc.metadata.get('language')}] {doc.page_content[:60]}")Update Filters
vector_store.update_filters([
{
"id": new_ids[0],
"filter": {"topic": "programming", "language": "go", "difficulty": "intermediate"},
},
])Delete
# Delete by IDs
vector_store.delete(ids=[new_ids[1]])
# Delete by filter
vector_store.delete(filter=[{"language": {"$eq": "go"}}])LangChain Retriever
as_retriever() wraps any VectorStore into a LangChain Retriever — the standard interface for plugging search into chains, agents, and RAG pipelines. It implements invoke(query) -> list[Document].
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke("What are vector databases used for?")With metadata filters:
retriever_filtered = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 3, "filter": [{"topic": {"$eq": "ai"}}]},
)
docs_filtered = retriever_filtered.invoke("machine learning")Hybrid Search
Hybrid search combines dense (semantic) and sparse (keyword) retrieval. Pass a sparse_embedding and retrieval_mode=RetrievalMode.HYBRID to enable it. All standard LangChain search methods then automatically fuse both signal types.
See Sparse Vectors (BM25) for sparse model options and Search for RRF tuning parameters.
# Option A: EndeeModelSparse (recommended)
sparse = EndeeModelSparse()
# Option B: FastEmbedSparse with SPLADE
# sparse = FastEmbedSparse()
# Option C: FastEmbedSparse with BM25
# sparse = FastEmbedSparse(model_name="Qdrant/bm25", batch_size=256)
hybrid_store = EndeeVectorStore.from_documents(
documents=documents,
embedding=embeddings,
index_name="rag_demo_hybrid",
api_token=ENDEE_TOKEN,
dimension=DIMENSION,
space_type="cosine",
retrieval_mode=RetrievalMode.HYBRID,
sparse_embedding=sparse,
force_recreate=True,
)Compare dense-only vs hybrid:
query = "vector database semantic search"
dense_hits = vector_store.similarity_search_with_score(query, k=3)
hybrid_hits = hybrid_store.similarity_search_with_score(query, k=3)
print("Dense only:")
for doc, score in dense_hits:
print(f" [{score:.3f}] {doc.page_content[:65]}")
print("\nHybrid (dense + BM25):")
for doc, score in hybrid_hits:
print(f" [{score:.3f}] {doc.page_content[:65]}")Tune RRF (Reciprocal Rank Fusion)
rrf_hits = hybrid_store.similarity_search_with_score(
query,
k=3,
rrf_rank_constant=60,
dense_rrf_weight=0.7,
)Full RAG Chain
This uses LangChain’s LCEL (LangChain Expression Language) to compose a retrieval-augmented generation pipeline:
Retrieverfetches relevant documentsChatPromptTemplateformats the context + question into a promptHuggingFacePipelineruns a local LLM (no API key needed)RunnablePassthroughpasses the user question through unchangedStrOutputParserextracts the text from the LLM response
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
retriever = hybrid_store.as_retriever(search_kwargs={"k": 3})
# Runs locally — no API key needed
llm = HuggingFacePipeline.from_model_id(
model_id="google/flan-t5-base",
task="text2text-generation",
pipeline_kwargs={"max_new_tokens": 256},
)
prompt = ChatPromptTemplate.from_template(
"Answer the question based only on the context below.\n\n"
"Context:\n{context}\n\n"
"Question: {question}"
)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke("What is deep learning and what does it power?")
print(answer)Reconnect to an Existing Index
LangChain’s from_existing_index() connects to a previously created index without re-ingesting — ideal for production use.
existing = EndeeVectorStore.from_existing_index(
index_name="rag_demo",
embedding=embeddings,
api_token=ENDEE_TOKEN,
)
docs = existing.similarity_search("Python", k=1)Key Takeaways
EndeeVectorStoreimplements LangChain’sVectorStoreinterface — all standard methods work.Embeddings— any LangChain embedding model plugs in directly.as_retriever()— wraps the store into a standard LangChainRetrieverfor chains and agents.- LCEL — compose retriever + prompt + LLM into a RAG pipeline with the
|operator. - Hybrid search — combines dense and sparse retrieval; see Sparse Vectors (BM25).
from_existing_index()— reconnect in production without re-ingesting.