Skip to Content

Endee + CrewAI Integration

Open In Colab

This walkthrough shows how to replace CrewAI’s default ChromaDB memory with Endee — giving agents persistent vector memory with metadata filtering and hybrid search, no OpenAI key needed.

Prerequisites: To run locally, clone and start Endee from the GitHub repo . Otherwise, use a token from app.endee.io . | CrewAI docs: docs.crewai.com 


How it fits together

CrewAI stores agent memory through a RAGStorage interface that defaults to ChromaDB. EndeeVectorStore is a drop-in replacement for that layer:

ShortTermMemory / EntityMemory ← CrewAI memory classes RAGStorage ← abstract storage interface ChromaDB (default) ← replaced by EndeeVectorStore
ShortTermMemory / EntityMemory EndeeVectorStore Endee (cloud or local)

Install

pip install crewai-endee sentence-transformers==3.0.0 pip install numpy==2.0.0

Pin sentence-transformers to 3.0.0 to avoid numpy compatibility issues. You’ll see a version conflict warning between the endee and sentence-transformers packages — it’s expected and doesn’t affect anything.


Imports and Token

import os import time from getpass import getpass from crewai_endee import EndeeVectorStore API_TOKEN = "" # your Endee token BASE_URL = "" # empty = default, or set custom URL

Endee Cloud — Set API_TOKEN to your token from app.endee.io . Leave BASE_URL empty.

Local server — Leave API_TOKEN empty. Set BASE_URL only if you’re on a non-default port (e.g. http://127.0.0.1:8081/api/v1). If not set, the SDK defaults to http://127.0.0.1:8080/api/v1.

When API_TOKEN is set, BASE_URL is ignored.


Connect to Endee

See Quick Start for server setup details.

embedder_config = { "provider": "sentence-transformer", "config": { "model_name": "all-MiniLM-L6-v2", "device": "cpu", }, } store = EndeeVectorStore( type="demo_dense", embedder_config=embedder_config, api_token=API_TOKEN, space_type="cosine", precision="int8", base_url=BASE_URL, ) store.reset() store.ensure_index()

all-MiniLM-L6-v2 runs fully locally (384 dimensions). Change device to "cuda" or "mps" for GPU.

store.reset() deletes the index and clears the SDK’s internal LRU cache. Without clearing that cache, a get_index call after deletion returns the stale cached object — which causes “Required files missing” errors on upsert. store.ensure_index() then creates a fresh index.

Parameters:

ParameterWhat it does
typeIndex name. Must be unique per project.
embedder_configDense embedding model config.
api_tokenCloud token. Omit for local.
space_typeDistance metric: "cosine", "l2", or "ip".
precisionQuantisation: "float32", "float16", "int16", "int8", "binary".
ef_conHNSW ef_construction — trades build time for index quality.
sparse_model_namePass "endee/bm25" to enable hybrid mode.
base_urlOverride the server URL. Ignored in cloud mode.

Insert Documents

documents = [ ( "Python is a high-level, interpreted language designed by Guido van Rossum in 1991. " "Typing: dynamic, strong. Uses: AI/ML, web, scripting.", {"lang": "Python", "year": 1991, "typing": "dynamic"}, ), ( "Java follows 'write once, run anywhere' on the JVM, designed by James Gosling in 1995. " "Typing: static, strong. Uses: enterprise, Android, backend.", {"lang": "Java", "year": 1995, "typing": "static"}, ), # ... more documents ] for text, meta in documents: store.save(value=text, metadata=meta)

save() embeds the text, assigns a UUID, and upserts into the index. Scalar metadata fields (str, int, float) are indexed as filterable fields. The full text is always stored in meta["value"] and is searchable via vector similarity.


time.sleep(2) # Endee indexes asynchronously queries = [ "Who created the Go programming language?", "Which languages use dynamic typing?", "Languages suitable for cloud-native microservices", ] for query in queries: results = store.search(query, limit=2) print(f"Query: '{query}'") for r in results: print(f" [{r['score']:.3f}] {r['content'][:80]}") print()

Each result has id, content (original text), metadata, and score (cosine similarity).

For the full list of search parameters (ef_search, include_vectors, prefilter_cardinality_threshold, etc.), see Search — Query Parameters.


Hybrid Mode (Dense + BM25)

Adding sparse_model_name="endee/bm25" enables hybrid search — dense semantic similarity combined with BM25 keyword matching. This helps recall memories with specific terms like function names or error codes that pure semantic search might miss.

hybrid_store = EndeeVectorStore( type="demo_hybrid", embedder_config=embedder_config, api_token=API_TOKEN, space_type="cosine", sparse_model_name="endee/bm25", base_url=BASE_URL, ) hybrid_store.reset() hybrid_store.ensure_index() for text, meta in documents: hybrid_store.save(value=text, metadata=meta) time.sleep(2)

The two ranked lists (dense + BM25) are merged using Reciprocal Rank Fusion: 1 / (k + rank) where k defaults to 60.

Side-by-side comparison:

query = "Go cloud-native microservices" print("Dense only:") for r in store.search(query, limit=3): print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}") print("\nHybrid (balanced):") for r in hybrid_store.search(query, limit=3): print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}") print("\nHybrid (favour dense):") for r in hybrid_store.search(query, limit=3, dense_rrf_weight=0.8, rrf_rank_constant=60): print(f" [{r['score']:.3f}] {r['metadata'].get('lang')}: {r['content'][:60]}")

dense_rrf_weight ranges from 0.0 (BM25 only) to 1.0 (dense only), default 0.5.

Note: a hybrid index can’t be converted to dense-only after creation. Create a separate index for each mode.


Metadata Filters

# Only static-typed languages results = store.search( query="high performance language", limit=3, filter=[{"typing": {"$eq": "static"}}], ) # Dynamic-typed languages above a minimum score results = store.search( query="web development", limit=3, filter=[{"typing": {"$eq": "dynamic"}}], score_threshold=0.2, )

The filter syntax is [{"field": {"$op": value}}]. Supported operators: $eq, $in, $range. Filters are applied during HNSW traversal, not post-hoc.

Use filter to restrict by metadata values; use score_threshold to cut off low-similarity results. They can be combined.

To get the raw vector alongside results:

results = store.search("interpreted language", limit=1, include_vectors=True) vec = results[0]["vector"]

Index Operations

# Inspect the index print("Index info:", store.describe()) # Save a document, fetch it by ID, update its metadata, delete it store.save( value="Rust is a systems language focused on memory safety, designed by Mozilla.", metadata={"lang": "Rust", "typing": "static"}, ) time.sleep(2) rust_results = store.search("Rust memory safety", limit=1) if rust_results: vec_id = rust_results[0]["id"] vec_data = store.get_vector(vec_id) print(f"Fetched: {vec_data.get('meta', {})}") store.update_filters([{"id": vec_id, "filter": {"reviewed": "true"}}]) store.delete_vector(vec_id) # Bulk delete by filter store.save(value="Temp doc 1", metadata={"category": "throwaway"}) store.save(value="Temp doc 2", metadata={"category": "throwaway"}) time.sleep(2) store.delete(filter=[{"category": {"$eq": "throwaway"}}])

update_filters() changes filterable metadata without re-embedding. delete(filter=...) removes all vectors matching a filter. reset() deletes the entire index.


Cleanup

for s, name in [ (store, "demo_dense"), (hybrid_store, "demo_hybrid"), ]: try: s.reset() print(f"Deleted: {name}") except Exception as e: print(f"Could not delete {name}: {e}")

Summary

WhatHow
ConnectEndeeVectorStore(type=..., embedder_config=..., api_token=API_TOKEN)
Local serverAdd base_url=BASE_URL (default → http://127.0.0.1:8080/api/v1)
Insertstore.save(value, metadata)
Searchstore.search(query, limit)
HybridAdd sparse_model_name="endee/bm25"
Filtersstore.search(query, filter=[{"key": {"$eq": "val"}}])
Start freshstore.reset() then store.ensure_index()