Endee + CrewAI Integration

This walkthrough shows how to replace CrewAI’s default ChromaDB memory with Endee — giving agents persistent vector memory with metadata filtering and hybrid search, no OpenAI key needed.

Prerequisites: To run locally, clone and start Endee from the GitHub repo . Otherwise, use a token from app.endee.io . | CrewAI docs: docs.crewai.com

How it fits together

CrewAI stores agent memory through a RAGStorage interface that defaults to ChromaDB. EndeeVectorStore is a drop-in replacement for that layer:


ShortTermMemory / EntityMemory      ← CrewAI memory classes
        │
    RAGStorage                      ← abstract storage interface
        │
    ChromaDB (default)              ← replaced by EndeeVectorStore


ShortTermMemory / EntityMemory
        │
    EndeeVectorStore
        │
    Endee (cloud or local)

Install


pip install crewai-endee sentence-transformers==3.0.0
pip install numpy==2.0.0

Pin sentence-transformers to 3.0.0 to avoid numpy compatibility issues. You’ll see a version conflict warning between the endee and sentence-transformers packages — it’s expected and doesn’t affect anything.

Imports and Token


import os
import time
from getpass import getpass
 
from crewai_endee import EndeeVectorStore
 
API_TOKEN = ""  # your Endee token
BASE_URL = ""   # empty = default, or set custom URL

Endee Cloud — Set API_TOKEN to your token from app.endee.io . Leave BASE_URL empty.

Local server — Leave API_TOKEN empty. Set BASE_URL only if you’re on a non-default port (e.g. http://127.0.0.1:8081/api/v1). If not set, the SDK defaults to http://127.0.0.1:8080/api/v1.

When API_TOKEN is set, BASE_URL is ignored.

Connect to Endee

See Quick Start for server setup details.


embedder_config = {
    "provider": "sentence-transformer",
    "config": {
        "model_name": "all-MiniLM-L6-v2",
        "device": "cpu",
    },
}
 
store = EndeeVectorStore(
    type="demo_dense",
    embedder_config=embedder_config,
    api_token=API_TOKEN,
    space_type="cosine",
    precision="int8",
    base_url=BASE_URL,
)
 
store.reset()
store.ensure_index()

all-MiniLM-L6-v2 runs fully locally (384 dimensions). Change device to "cuda" or "mps" for GPU.

store.reset() deletes the index and clears the SDK’s internal LRU cache. Without clearing that cache, a get_index call after deletion returns the stale cached object — which causes “Required files missing” errors on upsert. store.ensure_index() then creates a fresh index.

Parameters:

Parameter	What it does
`type`	Index name. Must be unique per project.
`embedder_config`	Dense embedding model config.
`api_token`	Cloud token. Omit for local.
`space_type`	Distance metric: `"cosine"`, `"l2"`, or `"ip"`.
`precision`	Quantisation: `"float32"`, `"float16"`, `"int16"`, `"int8"`, `"binary"`.
`ef_con`	HNSW `ef_construction` — trades build time for index quality.
`sparse_model_name`	Pass `"endee/bm25"` to enable hybrid mode.
`base_url`	Override the server URL. Ignored in cloud mode.

Insert Documents


documents = [
    (
        "Python is a high-level, interpreted language designed by Guido van Rossum in 1991. "
        "Typing: dynamic, strong. Uses: AI/ML, web, scripting.",
        {"lang": "Python", "year": 1991, "typing": "dynamic"},
    ),
    (
        "Java follows 'write once, run anywhere' on the JVM, designed by James Gosling in 1995. "
        "Typing: static, strong. Uses: enterprise, Android, backend.",
        {"lang": "Java", "year": 1995, "typing": "static"},
    ),
    # ... more documents
]
 
for text, meta in documents:
    store.save(value=text, metadata=meta)

save() embeds the text, assigns a UUID, and upserts into the index. Scalar metadata fields (str, int, float) are indexed as filterable fields. The full text is always stored in meta["value"] and is searchable via vector similarity.

Dense Search


time.sleep(2)  # Endee indexes asynchronously
 
queries = [
    "Who created the Go programming language?",
    "Which languages use dynamic typing?",
    "Languages suitable for cloud-native microservices",
]
 
for query in queries:
    results = store.search(query, limit=2)
    print(f"Query: '{query}'")
    for r in results:
        print(f"  [{r['score']:.3f}] {r['content'][:80]}")
    print()

Each result has id, content (original text), metadata, and score (cosine similarity).

For the full list of search parameters (ef_search, include_vectors, prefilter_cardinality_threshold, etc.), see Search — Query Parameters.

Hybrid Mode (Dense + BM25)

Adding sparse_model_name="endee/bm25" enables hybrid search — dense semantic similarity combined with BM25 keyword matching. This helps recall memories with specific terms like function names or error codes that pure semantic search might miss.


hybrid_store = EndeeVectorStore(
    type="demo_hybrid",
    embedder_config=embedder_config,
    api_token=API_TOKEN,
    space_type="cosine",
    sparse_model_name="endee/bm25",
    base_url=BASE_URL,
)
 
hybrid_store.reset()
hybrid_store.ensure_index()
 
for text, meta in documents:
    hybrid_store.save(value=text, metadata=meta)
 
time.sleep(2)

The two ranked lists (dense + BM25) are merged using Reciprocal Rank Fusion: 1 / (k + rank) where k defaults to 60.

Side-by-side comparison:


query = "Go cloud-native microservices"
 
print("Dense only:")
for r in store.search(query, limit=3):
    print(f"  [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")
 
print("\nHybrid (balanced):")
for r in hybrid_store.search(query, limit=3):
    print(f"  [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")
 
print("\nHybrid (favour dense):")
for r in hybrid_store.search(query, limit=3, dense_rrf_weight=0.8, rrf_rank_constant=60):
    print(f"  [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")

dense_rrf_weight ranges from 0.0 (BM25 only) to 1.0 (dense only), default 0.5.

Note: a hybrid index can’t be converted to dense-only after creation. Create a separate index for each mode.

Metadata Filters


# Only static-typed languages
results = store.search(
    query="high performance language",
    limit=3,
    filter=[{"typing": {"$eq": "static"}}],
)
 
# Dynamic-typed languages above a minimum score
results = store.search(
    query="web development",
    limit=3,
    filter=[{"typing": {"$eq": "dynamic"}}],
    score_threshold=0.2,
)

The filter syntax is [{"field": {"$op": value}}]. Supported operators: $eq, $in, $range. Filters are applied during HNSW traversal, not post-hoc.

Use filter to restrict by metadata values; use score_threshold to cut off low-similarity results. They can be combined.

To get the raw vector alongside results:


results = store.search("interpreted language", limit=1, include_vectors=True)
vec = results[0]["vector"]

Index Operations


# Inspect the index
print("Index info:", store.describe())
 
# Save a document, fetch it by ID, update its metadata, delete it
store.save(
    value="Rust is a systems language focused on memory safety, designed by Mozilla.",
    metadata={"lang": "Rust", "typing": "static"},
)
time.sleep(2)
 
rust_results = store.search("Rust memory safety", limit=1)
if rust_results:
    vec_id = rust_results[0]["id"]
 
    vec_data = store.get_vector(vec_id)
    print(f"Fetched: {vec_data.get('meta', {})}")
 
    store.update_filters([{"id": vec_id, "filter": {"reviewed": "true"}}])
    store.delete_vector(vec_id)
 
# Bulk delete by filter
store.save(value="Temp doc 1", metadata={"category": "throwaway"})
store.save(value="Temp doc 2", metadata={"category": "throwaway"})
time.sleep(2)
store.delete(filter=[{"category": {"$eq": "throwaway"}}])

update_filters() changes filterable metadata without re-embedding. delete(filter=...) removes all vectors matching a filter. reset() deletes the entire index.

Cleanup


for s, name in [
    (store,        "demo_dense"),
    (hybrid_store, "demo_hybrid"),
]:
    try:
        s.reset()
        print(f"Deleted: {name}")
    except Exception as e:
        print(f"Could not delete {name}: {e}")

Summary

What	How
Connect	`EndeeVectorStore(type=..., embedder_config=..., api_token=API_TOKEN)`
Local server	Add `base_url=BASE_URL` (default → `http://127.0.0.1:8080/api/v1`)
Insert	`store.save(value, metadata)`
Search	`store.search(query, limit)`
Hybrid	Add `sparse_model_name="endee/bm25"`
Filters	`store.search(query, filter=[{"key": {"$eq": "val"}}])`
Start fresh	`store.reset()` then `store.ensure_index()`