Endee + CrewAI Integration
This walkthrough shows how to replace CrewAI’s default ChromaDB memory with Endee — giving agents persistent vector memory with metadata filtering and hybrid search, no OpenAI key needed.
Prerequisites: To run locally, clone and start Endee from the GitHub repo . Otherwise, use a token from app.endee.io . | CrewAI docs: docs.crewai.com
How it fits together
CrewAI stores agent memory through a RAGStorage interface that defaults to ChromaDB. EndeeVectorStore is a drop-in replacement for that layer:
ShortTermMemory / EntityMemory ← CrewAI memory classes
│
RAGStorage ← abstract storage interface
│
ChromaDB (default) ← replaced by EndeeVectorStoreShortTermMemory / EntityMemory
│
EndeeVectorStore
│
Endee (cloud or local)Install
pip install crewai-endee sentence-transformers==3.0.0
pip install numpy==2.0.0Pin sentence-transformers to 3.0.0 to avoid numpy compatibility issues. You’ll see a version conflict warning between the endee and sentence-transformers packages — it’s expected and doesn’t affect anything.
Imports and Token
import os
import time
from getpass import getpass
from crewai_endee import EndeeVectorStore
API_TOKEN = "" # your Endee token
BASE_URL = "" # empty = default, or set custom URLEndee Cloud — Set API_TOKEN to your token from app.endee.io . Leave BASE_URL empty.
Local server — Leave API_TOKEN empty. Set BASE_URL only if you’re on a non-default port (e.g. http://127.0.0.1:8081/api/v1). If not set, the SDK defaults to http://127.0.0.1:8080/api/v1.
When API_TOKEN is set, BASE_URL is ignored.
Connect to Endee
See Quick Start for server setup details.
embedder_config = {
"provider": "sentence-transformer",
"config": {
"model_name": "all-MiniLM-L6-v2",
"device": "cpu",
},
}
store = EndeeVectorStore(
type="demo_dense",
embedder_config=embedder_config,
api_token=API_TOKEN,
space_type="cosine",
precision="int8",
base_url=BASE_URL,
)
store.reset()
store.ensure_index()all-MiniLM-L6-v2 runs fully locally (384 dimensions). Change device to "cuda" or "mps" for GPU.
store.reset() deletes the index and clears the SDK’s internal LRU cache. Without clearing that cache, a get_index call after deletion returns the stale cached object — which causes “Required files missing” errors on upsert. store.ensure_index() then creates a fresh index.
Parameters:
| Parameter | What it does |
|---|---|
type | Index name. Must be unique per project. |
embedder_config | Dense embedding model config. |
api_token | Cloud token. Omit for local. |
space_type | Distance metric: "cosine", "l2", or "ip". |
precision | Quantisation: "float32", "float16", "int16", "int8", "binary". |
ef_con | HNSW ef_construction — trades build time for index quality. |
sparse_model_name | Pass "endee/bm25" to enable hybrid mode. |
base_url | Override the server URL. Ignored in cloud mode. |
Insert Documents
documents = [
(
"Python is a high-level, interpreted language designed by Guido van Rossum in 1991. "
"Typing: dynamic, strong. Uses: AI/ML, web, scripting.",
{"lang": "Python", "year": 1991, "typing": "dynamic"},
),
(
"Java follows 'write once, run anywhere' on the JVM, designed by James Gosling in 1995. "
"Typing: static, strong. Uses: enterprise, Android, backend.",
{"lang": "Java", "year": 1995, "typing": "static"},
),
# ... more documents
]
for text, meta in documents:
store.save(value=text, metadata=meta)save() embeds the text, assigns a UUID, and upserts into the index. Scalar metadata fields (str, int, float) are indexed as filterable fields. The full text is always stored in meta["value"] and is searchable via vector similarity.
Dense Search
time.sleep(2) # Endee indexes asynchronously
queries = [
"Who created the Go programming language?",
"Which languages use dynamic typing?",
"Languages suitable for cloud-native microservices",
]
for query in queries:
results = store.search(query, limit=2)
print(f"Query: '{query}'")
for r in results:
print(f" [{r['score']:.3f}] {r['content'][:80]}")
print()Each result has id, content (original text), metadata, and score (cosine similarity).
For the full list of search parameters (ef_search, include_vectors, prefilter_cardinality_threshold, etc.), see Search — Query Parameters.
Hybrid Mode (Dense + BM25)
Adding sparse_model_name="endee/bm25" enables hybrid search — dense semantic similarity combined with BM25 keyword matching. This helps recall memories with specific terms like function names or error codes that pure semantic search might miss.
hybrid_store = EndeeVectorStore(
type="demo_hybrid",
embedder_config=embedder_config,
api_token=API_TOKEN,
space_type="cosine",
sparse_model_name="endee/bm25",
base_url=BASE_URL,
)
hybrid_store.reset()
hybrid_store.ensure_index()
for text, meta in documents:
hybrid_store.save(value=text, metadata=meta)
time.sleep(2)The two ranked lists (dense + BM25) are merged using Reciprocal Rank Fusion: 1 / (k + rank) where k defaults to 60.
Side-by-side comparison:
query = "Go cloud-native microservices"
print("Dense only:")
for r in store.search(query, limit=3):
print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")
print("\nHybrid (balanced):")
for r in hybrid_store.search(query, limit=3):
print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")
print("\nHybrid (favour dense):")
for r in hybrid_store.search(query, limit=3, dense_rrf_weight=0.8, rrf_rank_constant=60):
print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")dense_rrf_weight ranges from 0.0 (BM25 only) to 1.0 (dense only), default 0.5.
Note: a hybrid index can’t be converted to dense-only after creation. Create a separate index for each mode.
Metadata Filters
# Only static-typed languages
results = store.search(
query="high performance language",
limit=3,
filter=[{"typing": {"$eq": "static"}}],
)
# Dynamic-typed languages above a minimum score
results = store.search(
query="web development",
limit=3,
filter=[{"typing": {"$eq": "dynamic"}}],
score_threshold=0.2,
)The filter syntax is [{"field": {"$op": value}}]. Supported operators: $eq, $in, $range. Filters are applied during HNSW traversal, not post-hoc.
Use filter to restrict by metadata values; use score_threshold to cut off low-similarity results. They can be combined.
To get the raw vector alongside results:
results = store.search("interpreted language", limit=1, include_vectors=True)
vec = results[0]["vector"]Index Operations
# Inspect the index
print("Index info:", store.describe())
# Save a document, fetch it by ID, update its metadata, delete it
store.save(
value="Rust is a systems language focused on memory safety, designed by Mozilla.",
metadata={"lang": "Rust", "typing": "static"},
)
time.sleep(2)
rust_results = store.search("Rust memory safety", limit=1)
if rust_results:
vec_id = rust_results[0]["id"]
vec_data = store.get_vector(vec_id)
print(f"Fetched: {vec_data.get('meta', {})}")
store.update_filters([{"id": vec_id, "filter": {"reviewed": "true"}}])
store.delete_vector(vec_id)
# Bulk delete by filter
store.save(value="Temp doc 1", metadata={"category": "throwaway"})
store.save(value="Temp doc 2", metadata={"category": "throwaway"})
time.sleep(2)
store.delete(filter=[{"category": {"$eq": "throwaway"}}])update_filters() changes filterable metadata without re-embedding. delete(filter=...) removes all vectors matching a filter. reset() deletes the entire index.
Cleanup
for s, name in [
(store, "demo_dense"),
(hybrid_store, "demo_hybrid"),
]:
try:
s.reset()
print(f"Deleted: {name}")
except Exception as e:
print(f"Could not delete {name}: {e}")Summary
| What | How |
|---|---|
| Connect | EndeeVectorStore(type=..., embedder_config=..., api_token=API_TOKEN) |
| Local server | Add base_url=BASE_URL (default → http://127.0.0.1:8080/api/v1) |
| Insert | store.save(value, metadata) |
| Search | store.search(query, limit) |
| Hybrid | Add sparse_model_name="endee/bm25" |
| Filters | store.search(query, filter=[{"key": {"$eq": "val"}}]) |
| Start fresh | store.reset() then store.ensure_index() |