Endee + LlamaIndex: Quick Start
Use Endee as a LlamaIndex vector store for semantic search and RAG pipelines.
Install & Setup
Install the required libraries for LlamaIndex and Endee integration.
pip install -q llama-index-vector-stores-endee llama-index-embeddings-huggingface
pip install numpy==2.0.0Authentication
Choose your connection method: local server or serverless cloud.
Local Server: If your server has NDD_AUTH_TOKEN set, pass the same token when initializing:
API_TOKEN = "ndd-auth-token"
BASE_URL = "http://127.0.0.1:8080/api/v1"Endee Serverless: Go to https://app.endee.io , create a token, then pass it here:
API_TOKEN = "your-serverless-token"
BASE_URL = "" # leave empty for serverlessImport & Create Vector Store
Import libraries and create a vector store connected to Endee.
from llama_index.core import Document, StorageContext, VectorStoreIndex, Settings
from llama_index.core.vector_stores.types import MetadataFilters, MetadataFilter, FilterOperator
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index_endee import EndeeVectorStore
INDEX_NAME = "rag_demo_index"
DIMENSION = 384
# Create vector store (use API_TOKEN and BASE_URL from above)
vector_store = EndeeVectorStore.from_params(
api_token=API_TOKEN,
base_url=BASE_URL,
index_name=INDEX_NAME,
dimension=DIMENSION,
sparse_model="endee_bm25", # enables hybrid search automatically
)
# Load embedding model globally
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
Settings.embed_model = embed_modelAdd Documents
Create documents with text and metadata, then index them all at once.
documents = [
Document(
text="Python is a high-level programming language known for readability.",
metadata={"category": "programming", "language": "python", "level": "beginner"},
),
... # more documents
]
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)Reconnect to an Existing Index
If the index already exists, from_params reconnects — no data loss. You don’t need to pass dimension:
vector_store = EndeeVectorStore.from_params(
api_token=API_TOKEN,
base_url=BASE_URL,
index_name=INDEX_NAME,
)
# Loads the existing index for querying — no re-indexing needed
index = VectorStoreIndex.from_vector_store(vector_store)Dense Search
Find similar documents by semantic meaning using query embeddings.
retriever = index.as_retriever(similarity_top_k=2)
query = "Tell me about vector databases"
results = retriever.retrieve(query)
print(f"Query: '{query}'\n")
for i, node in enumerate(results, 1):
print(f"{i}. score={node.get_score():.3f}")
print(f" {node.text}")
print(f" {node.metadata}\n")Filter Results
Restrict search to specific documents using metadata filters.
# EQ — only AI documents
eq_filters = MetadataFilters(
filters=[MetadataFilter(key="category", value="ai", operator=FilterOperator.EQ)]
)
results = index.as_retriever(similarity_top_k=2, filters=eq_filters).retrieve(
"How do systems learn from data?"
)
print("EQ filter (category == ai):")
for node in results:
print(f" [{node.metadata.get('field')}] {node.text[:70]}\n")
# IN — multiple categories
in_filters = MetadataFilters(
filters=[MetadataFilter(key="category", value=["ai", "database"], operator=FilterOperator.IN)]
)
results = index.as_retriever(similarity_top_k=3, filters=in_filters).retrieve(
"vector search and machine learning"
)
print("IN filter (category in [ai, database]):")
for i, node in enumerate(results, 1):
print(f" {i}. {node.text[:70]}")Hybrid Search
Automatically combines dense vectors with BM25 keyword matching for better results.
query = "privacy vector search"
print(f"Query: '{query}'\n")
# Compare different dense/sparse weights
for weight, label in [(1.0, "dense only"), (0.5, "balanced"), (0.0, "sparse only")]:
results = index.as_retriever(
similarity_top_k=3,
vector_store_kwargs={"dense_rrf_weight": weight},
).retrieve(query)
print(f"dense_rrf_weight={weight} ({label}):")
for i, node in enumerate(results, 1):
print(f" {i}. score={node.get_score():.4f} {node.text[:75]}\n")Query Tuning
Adjust search parameters for different performance and recall tradeoffs.
# Tune ef (higher = better recall, slower)
query_text = "Endee features"
print(f"Query: '{query_text}'\n")
for ef_val in [64, 128, 256]:
results = index.as_retriever(
similarity_top_k=2,
vector_store_kwargs={"ef": ef_val},
).retrieve(query_text)
top = results[0] if results else None
print(f" ef={ef_val:4d} score={top.get_score():.4f} {top.text[:70]}\n")
# Tune prefilter threshold with metadata filter
db_filters = MetadataFilters(
filters=[MetadataFilter(key="category", value="database", operator=FilterOperator.EQ)]
)
results = index.as_retriever(
similarity_top_k=2,
filters=db_filters,
vector_store_kwargs={
"ef": 200,
"prefilter_cardinality_threshold": 1000,
"filter_boost_percentage": 20,
},
).retrieve("vector database search")
print("Prefilter tuning (category == database):")
for i, node in enumerate(results, 1):
print(f" {i}. {node.text[:80]}")Vector Operations
Work directly with Endee to inspect, update, and delete vectors.
# Fetch a vector by ID
sample_nodes = index.as_retriever(similarity_top_k=1).retrieve("vector database")
sample_id = sample_nodes[0].node.id_
fetched = vector_store.fetch([sample_id])
vec = fetched[0]
print(f"Fetched: {sample_id}")
print(f" embedding dim: {len(vec.get('vector', []))}")
print(f" metadata: {vec.get('filter', {})}")
print(f" has sparse data: {'sparse_indices' in vec}\n")
# Update metadata without re-embedding
print(f"Before: {vec.get('filter', {})}")
vector_store.update_filters([
{"id": sample_id, "filter": {"category": "database", "status": "reviewed", "priority": "high"}}
])
updated = vector_store.fetch([sample_id])
print(f"After : {updated[0].get('filter', {})}\n")
# Delete the vector
vector_store.delete_vector(sample_id)
print(f"Deleted: {sample_id}")
info = vector_store.describe()
print(f"Remaining vectors: {info.get('count')}")Cleanup
Delete the index.
vector_store.clear()
print(f"Deleted: {INDEX_NAME}")
Key Methods
Each method handles a specific operation in the vector search workflow.
| Method | What it does |
|---|---|
VectorStoreIndex.from_documents() | Embed and store documents |
as_retriever() | Create a retriever for queries |
retrieve(query) | Find similar documents |
fetch(ids) | Get full vector data |
update_filters() | Change metadata without re-embedding |
delete_vector(id) | Delete a single vector |
describe() | View index stats |
Quick Tips
Simple guidelines for common scenarios.
- Settings.embed_model - Set globally so all operations use it automatically
- sparse_model=“endee_bm25” - Enables hybrid search automatically
- similarity_top_k=2 or 3 - Usually enough for RAG context
- MetadataFilters - Use EQ and IN operators to restrict results
- dense_rrf_weight - 1.0=dense only, 0.5=balanced, 0.0=sparse only
- ef parameter - Higher values improve recall but are slower
- prefilter_cardinality_threshold - Switch point between HNSW and exact search