LlamaIndex Integration with Endee
LlamaIndex is a data framework for building LLM-powered applications over your own data. This integration uses Endee as a LlamaIndex vector store via EndeeVectorStore, which plugs directly into LlamaIndex’s VectorStore interface. You get Endee’s hybrid search, metadata filtering, and persistent indexes while using LlamaIndex’s standard retrieval primitives — VectorStoreIndex, as_retriever(), and query engines all work without modification.
Your App → LlamaIndex (orchestration) → EndeeVectorStore → Endee| LlamaIndex call | EndeeVectorStore method | Endee SDK call |
|---|---|---|
VectorStoreIndex.from_documents(docs) | vector_store.add(nodes) | Index.upsert() |
index.as_retriever().retrieve("query") | vector_store.query() | Index.query() |
EndeeVectorStore.from_params(...) | creates or reconnects | Endee.create_index() / Endee.get_index() |
Install Dependencies
pip install llama-index-vector-stores-endee llama-index-embeddings-huggingface python-dotenvImport and Configure
import os
from dotenv import load_dotenv
from llama_index.core import Document, StorageContext, VectorStoreIndex, Settings
from llama_index.core.vector_stores.types import MetadataFilters, MetadataFilter, FilterOperator
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index_endee import EndeeVectorStore
load_dotenv()
ENDEE_API_TOKEN = os.getenv("ENDEE_API_TOKEN")Set Up Embedding Model
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
DIMENSION = 384Connect to Endee
See Quick Start for server setup details.
With an API token (Endee Cloud) — get a token from app.endee.io :
vector_store = EndeeVectorStore.from_params(
api_token=ENDEE_API_TOKEN,
index_name="my_index",
dimension=DIMENSION,
)Without a token (local server):
vector_store = EndeeVectorStore.from_params(
index_name="my_index",
dimension=DIMENSION,
)Reconnecting to an existing index — no data loss, dimension not required:
vector_store = EndeeVectorStore.from_params(
api_token=ENDEE_API_TOKEN,
index_name="my_existing_index",
)
index = VectorStoreIndex.from_vector_store(vector_store)from_params parameters:
| Parameter | Description | Default |
|---|---|---|
api_token | Endee API token | None (local) |
index_name | Index name | Required |
dimension | Must match your embedding model | Required for new indexes |
sparse_model | None (dense), "endee_bm25" (BM25), "default" (SPLADE) | None |
batch_size | Vectors per upsert | 100 |
See Indexes for index parameters (space_type, precision, M, ef_con).
Dense Search
The default mode when sparse_model is not set. VectorStoreIndex.from_documents() chunks, embeds, and calls vector_store.add() to upsert into Endee.
vector_store = EndeeVectorStore.from_params(
api_token=ENDEE_API_TOKEN,
index_name="dense_demo",
dimension=DIMENSION,
)
documents = [
Document(
text="Python is a high-level programming language prized for its readability.",
metadata={"category": "programming", "language": "python", "level": "beginner"},
),
Document(
text="Endee is a managed vector database for production RAG workloads.",
metadata={"category": "database", "type": "vector", "level": "intermediate"},
),
Document(
text="Machine learning allows systems to learn patterns from data.",
metadata={"category": "ai", "field": "ml", "level": "intermediate"},
),
Document(
text="LlamaIndex is an open-source data framework for LLM-powered applications.",
metadata={"category": "ai", "field": "rag", "level": "beginner"},
),
Document(
text="Vector databases store and search high-dimensional embedding vectors at scale.",
metadata={"category": "database", "type": "vector", "level": "intermediate"},
),
]
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)Retrieve
index.as_retriever().retrieve() embeds your query and calls vector_store.query().
retriever = index.as_retriever(similarity_top_k=2)
results = retriever.retrieve("Tell me about vector databases")
for i, node in enumerate(results, start=1):
print(f"{i}. Score: {node.get_score():.4f}")
print(f" Text: {node.text}")
print(f" Category: {node.metadata.get('category')}\n")Dense + Sparse Search
Set sparse_model in from_params to enable dense + sparse search.
sparse_model value | Encoder | Install |
|---|---|---|
"endee_bm25" | BM25 via endee_model | included (core dep) |
"default" | SPLADE++ via fastembed | pip install llama-index-vector-stores-endee[splade] |
vector_store = EndeeVectorStore.from_params(
api_token=ENDEE_API_TOKEN,
index_name="bm25_demo",
dimension=DIMENSION,
sparse_model="endee_bm25",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)Control dense vs sparse balance with dense_rrf_weight via vector_store_kwargs:
for weight, label in [(1.0, "dense-only"), (0.5, "balanced"), (0.0, "sparse-only")]:
retriever = index.as_retriever(
similarity_top_k=3,
vector_store_kwargs={"dense_rrf_weight": weight},
)
results = retriever.retrieve("privacy vector search")
print(f" dense_rrf_weight={weight} ({label}):")
for i, node in enumerate(results, start=1):
print(f" {i}. Score: {node.get_score():.4f} | {node.text[:80]}...")dense_rrf_weight | Effect |
|---|---|
1.0 | Dense only |
0.5 | Balanced (default) |
0.0 | Sparse only |
See Search for RRF tuning details.
Metadata Filtering
Pass filters to as_retriever() — they are forwarded to vector_store.query(). See Filtering for supported operators.
EQ Filter — Exact Match
eq_filters = MetadataFilters(
filters=[MetadataFilter(key="category", value="ai", operator=FilterOperator.EQ)]
)
filtered_retriever = index.as_retriever(similarity_top_k=2, filters=eq_filters)
results = filtered_retriever.retrieve("How do systems learn from data?")
for node in results:
print(f" {node.text}")
print(f" Category: {node.metadata.get('category')}, Field: {node.metadata.get('field')}\n")IN Filter — Match Any Value in a List
in_filters = MetadataFilters(
filters=[MetadataFilter(key="category", value=["ai", "database"], operator=FilterOperator.IN)]
)
in_retriever = index.as_retriever(similarity_top_k=3, filters=in_filters)
results = in_retriever.retrieve("vector search and machine learning")
for i, node in enumerate(results, start=1):
print(f"{i}. {node.text}")
print(f" Category: {node.metadata.get('category')}\n")Query Tuning
Pass tuning parameters via vector_store_kwargs — forwarded to vector_store.query(). See Filtering for prefilter details.
| Parameter | Description | Default |
|---|---|---|
dense_rrf_weight | Dense (1.0) vs sparse (0.0) balance when sparse_model is set | 0.5 |
ef | Search quality — higher explores more candidates | 128 |
prefilter_cardinality_threshold | Switch between HNSW and brute-force | — |
filter_boost_percentage | Extra candidates fetched before filtering | — |
include_vectors | Return stored embeddings in results | True |
retriever = index.as_retriever(
similarity_top_k=5,
filters=eq_filters,
vector_store_kwargs={"dense_rrf_weight": 0.7, "ef": 200},
)Vector Operations
These methods call the Endee SDK directly, bypassing LlamaIndex’s query engine.
| Method | Endee SDK call |
|---|---|
vector_store.fetch(ids) | Index.get_vector() |
vector_store.update_filters(updates) | Index.update_filters() |
vector_store.delete_vector(id) | Index.delete_vector() |
vector_store.delete(ref_doc_id=...) | Index.delete_with_filter() |
vector_store.describe() | Index.describe() |
Fetch a Vector
retriever = index.as_retriever(similarity_top_k=1)
sample_nodes = retriever.retrieve("vector database")
sample_id = sample_nodes[0].node.id_
fetched = vector_store.fetch([sample_id])
vec = fetched[0]
print(f"Embedding dim : {len(vec.get('vector', []))}")
print(f"Filter metadata: {vec.get('filter', {})}")
print(f"Metadata keys : {list(vec.get('meta', {}).keys())}")Update Filter Metadata
result = vector_store.update_filters([
{"id": sample_id, "filter": {"category": "database", "status": "reviewed"}}
])Delete a Vector
vector_store.delete_vector(sample_id)Key Takeaways
EndeeVectorStoreplugs directly into LlamaIndex’sVectorStoreinterface — all standard retrieval methods work.from_params()creates or reconnects to an index — safe to call on existing indexes.- Sparse search — set
sparse_model="endee_bm25"at creation time; see Sparse Vectors (BM25). - Filters — pass
MetadataFilterstoas_retriever(); see Filtering for operators. - Tuning — use
vector_store_kwargsto controldense_rrf_weight,ef, and prefilter thresholds.