Skip to Content
IntegrationsLlamaIndex

LlamaIndex Integration with Endee

Open In Colab

LlamaIndex  is a data framework for building LLM-powered applications over your own data. This integration uses Endee as a LlamaIndex vector store via EndeeVectorStore, which plugs directly into LlamaIndex’s VectorStore interface. You get Endee’s hybrid search, metadata filtering, and persistent indexes while using LlamaIndex’s standard retrieval primitives — VectorStoreIndex, as_retriever(), and query engines all work without modification.

Your App → LlamaIndex (orchestration) → EndeeVectorStore → Endee
LlamaIndex callEndeeVectorStore methodEndee SDK call
VectorStoreIndex.from_documents(docs)vector_store.add(nodes)Index.upsert()
index.as_retriever().retrieve("query")vector_store.query()Index.query()
EndeeVectorStore.from_params(...)creates or reconnectsEndee.create_index() / Endee.get_index()

Install Dependencies

pip install llama-index-vector-stores-endee llama-index-embeddings-huggingface python-dotenv

Import and Configure

import os from dotenv import load_dotenv from llama_index.core import Document, StorageContext, VectorStoreIndex, Settings from llama_index.core.vector_stores.types import MetadataFilters, MetadataFilter, FilterOperator from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index_endee import EndeeVectorStore load_dotenv() ENDEE_API_TOKEN = os.getenv("ENDEE_API_TOKEN")

Set Up Embedding Model

Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2") DIMENSION = 384

Connect to Endee

See Quick Start for server setup details.

With an API token (Endee Cloud) — get a token from app.endee.io :

vector_store = EndeeVectorStore.from_params( api_token=ENDEE_API_TOKEN, index_name="my_index", dimension=DIMENSION, )

Without a token (local server):

vector_store = EndeeVectorStore.from_params( index_name="my_index", dimension=DIMENSION, )

Reconnecting to an existing index — no data loss, dimension not required:

vector_store = EndeeVectorStore.from_params( api_token=ENDEE_API_TOKEN, index_name="my_existing_index", ) index = VectorStoreIndex.from_vector_store(vector_store)

from_params parameters:

ParameterDescriptionDefault
api_tokenEndee API tokenNone (local)
index_nameIndex nameRequired
dimensionMust match your embedding modelRequired for new indexes
sparse_modelNone (dense), "endee_bm25" (BM25), "default" (SPLADE)None
batch_sizeVectors per upsert100

See Indexes for index parameters (space_type, precision, M, ef_con).


The default mode when sparse_model is not set. VectorStoreIndex.from_documents() chunks, embeds, and calls vector_store.add() to upsert into Endee.

vector_store = EndeeVectorStore.from_params( api_token=ENDEE_API_TOKEN, index_name="dense_demo", dimension=DIMENSION, ) documents = [ Document( text="Python is a high-level programming language prized for its readability.", metadata={"category": "programming", "language": "python", "level": "beginner"}, ), Document( text="Endee is a managed vector database for production RAG workloads.", metadata={"category": "database", "type": "vector", "level": "intermediate"}, ), Document( text="Machine learning allows systems to learn patterns from data.", metadata={"category": "ai", "field": "ml", "level": "intermediate"}, ), Document( text="LlamaIndex is an open-source data framework for LLM-powered applications.", metadata={"category": "ai", "field": "rag", "level": "beginner"}, ), Document( text="Vector databases store and search high-dimensional embedding vectors at scale.", metadata={"category": "database", "type": "vector", "level": "intermediate"}, ), ] storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Retrieve

index.as_retriever().retrieve() embeds your query and calls vector_store.query().

retriever = index.as_retriever(similarity_top_k=2) results = retriever.retrieve("Tell me about vector databases") for i, node in enumerate(results, start=1): print(f"{i}. Score: {node.get_score():.4f}") print(f" Text: {node.text}") print(f" Category: {node.metadata.get('category')}\n")

Set sparse_model in from_params to enable dense + sparse search.

sparse_model valueEncoderInstall
"endee_bm25"BM25 via endee_modelincluded (core dep)
"default"SPLADE++ via fastembedpip install llama-index-vector-stores-endee[splade]
vector_store = EndeeVectorStore.from_params( api_token=ENDEE_API_TOKEN, index_name="bm25_demo", dimension=DIMENSION, sparse_model="endee_bm25", ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Control dense vs sparse balance with dense_rrf_weight via vector_store_kwargs:

for weight, label in [(1.0, "dense-only"), (0.5, "balanced"), (0.0, "sparse-only")]: retriever = index.as_retriever( similarity_top_k=3, vector_store_kwargs={"dense_rrf_weight": weight}, ) results = retriever.retrieve("privacy vector search") print(f" dense_rrf_weight={weight} ({label}):") for i, node in enumerate(results, start=1): print(f" {i}. Score: {node.get_score():.4f} | {node.text[:80]}...")
dense_rrf_weightEffect
1.0Dense only
0.5Balanced (default)
0.0Sparse only

See Search for RRF tuning details.


Metadata Filtering

Pass filters to as_retriever() — they are forwarded to vector_store.query(). See Filtering for supported operators.

EQ Filter — Exact Match

eq_filters = MetadataFilters( filters=[MetadataFilter(key="category", value="ai", operator=FilterOperator.EQ)] ) filtered_retriever = index.as_retriever(similarity_top_k=2, filters=eq_filters) results = filtered_retriever.retrieve("How do systems learn from data?") for node in results: print(f" {node.text}") print(f" Category: {node.metadata.get('category')}, Field: {node.metadata.get('field')}\n")

IN Filter — Match Any Value in a List

in_filters = MetadataFilters( filters=[MetadataFilter(key="category", value=["ai", "database"], operator=FilterOperator.IN)] ) in_retriever = index.as_retriever(similarity_top_k=3, filters=in_filters) results = in_retriever.retrieve("vector search and machine learning") for i, node in enumerate(results, start=1): print(f"{i}. {node.text}") print(f" Category: {node.metadata.get('category')}\n")

Query Tuning

Pass tuning parameters via vector_store_kwargs — forwarded to vector_store.query(). See Filtering for prefilter details.

ParameterDescriptionDefault
dense_rrf_weightDense (1.0) vs sparse (0.0) balance when sparse_model is set0.5
efSearch quality — higher explores more candidates128
prefilter_cardinality_thresholdSwitch between HNSW and brute-force
filter_boost_percentageExtra candidates fetched before filtering
include_vectorsReturn stored embeddings in resultsTrue
retriever = index.as_retriever( similarity_top_k=5, filters=eq_filters, vector_store_kwargs={"dense_rrf_weight": 0.7, "ef": 200}, )

Vector Operations

These methods call the Endee SDK directly, bypassing LlamaIndex’s query engine.

MethodEndee SDK call
vector_store.fetch(ids)Index.get_vector()
vector_store.update_filters(updates)Index.update_filters()
vector_store.delete_vector(id)Index.delete_vector()
vector_store.delete(ref_doc_id=...)Index.delete_with_filter()
vector_store.describe()Index.describe()

Fetch a Vector

retriever = index.as_retriever(similarity_top_k=1) sample_nodes = retriever.retrieve("vector database") sample_id = sample_nodes[0].node.id_ fetched = vector_store.fetch([sample_id]) vec = fetched[0] print(f"Embedding dim : {len(vec.get('vector', []))}") print(f"Filter metadata: {vec.get('filter', {})}") print(f"Metadata keys : {list(vec.get('meta', {}).keys())}")

Update Filter Metadata

result = vector_store.update_filters([ {"id": sample_id, "filter": {"category": "database", "status": "reviewed"}} ])

Delete a Vector

vector_store.delete_vector(sample_id)

Key Takeaways

  • EndeeVectorStore plugs directly into LlamaIndex’s VectorStore interface — all standard retrieval methods work.
  • from_params() creates or reconnects to an index — safe to call on existing indexes.
  • Sparse search — set sparse_model="endee_bm25" at creation time; see Sparse Vectors (BM25).
  • Filters — pass MetadataFilters to as_retriever(); see Filtering for operators.
  • Tuning — use vector_store_kwargs to control dense_rrf_weight, ef, and prefilter thresholds.