CrewAI Integration with Endee
CrewAI is a framework for orchestrating role-playing AI agents that collaborate to complete tasks. By default, CrewAI stores agent memory in a local ChromaDB instance tied to an OpenAI embedder. EndeeVectorStore replaces that default with Endee — giving your agents persistent vector memory with local embeddings (no API key required), hybrid dense + BM25 search, and metadata filtering, without changing how you define agents or tasks.
How the Integration Works
CrewAI’s memory system has three layers:
ShortTermMemory / EntityMemory ← CrewAI memory classes (manage agent context)
│
RAGStorage ← abstract storage interface (save + search)
│
ChromaDB (default) ← local SQLite + OpenAI embeddingsEndeeVectorStore replaces RAGStorage. It extends the same base class but swaps ChromaDB for Endee:
ShortTermMemory / EntityMemory
│
EndeeVectorStore ← our integration layer
│
Endee (cloud or local) ← your embedder + Endee indexWhat this changes:
- No API keys required — embeddings run locally via
sentence-transformers - Persistent indexes that survive restarts (not local SQLite)
- Hybrid search (dense + BM25) for better recall
- Metadata filtering at the storage level
Install
pip install "crewai-endee==0.1.1b5" sentence-transformers==3.0.0
pip install numpy==2.0.0 # required for sentence-transformers compatibilityNo API keys needed. Embeddings run locally via sentence-transformers.
Environment Setup
Only ENDEE_API_TOKEN is needed, and only for cloud mode. Omit it to run against a local Endee server.
import os
ENDEE_API_TOKEN = os.getenv("ENDEE_API_TOKEN") # None -> local modeConnect to Endee
See Quick Start for server setup details.
EndeeVectorStore extends CrewAI’s RAGStorage. On creation it:
- Calls
build_embedder(embedder_config)to create the dense embedding function - Optionally loads a sparse encoder if
sparse_model_nameis set - Lazily creates the Endee index on first
save()orsearch()call (or explicitly viaensure_index())
The vector dimension is auto-detected by embedding a test string on first use — you don’t need to specify it.
from crewai_endee import EndeeVectorStore
embedder_config = {
"provider": "sentence-transformer",
"config": {
"model_name": "all-MiniLM-L6-v2",
"device": "cpu",
},
}
store = EndeeVectorStore(
type="demo_dense", # becomes the Endee index name
embedder_config=embedder_config,
api_token=ENDEE_API_TOKEN, # None -> local Endee server
space_type="cosine",
precision="int8",
)
store.ensure_index() # creates index now (otherwise lazy on first save/search)Cloud vs local: If api_token is set, connects to Endee Cloud. If None, connects to a local server (default localhost:6070).
Insert Documents
save(value, metadata) does four things:
- Truncates the text to 8192 UTF-8 bytes
- Embeds the text using the configured embedder → dense vector
- Builds filterable fields — extracts all scalar (
str/int/float) metadata values into a separatefilterdict - Upserts into the Endee index with a UUID as the vector ID
store.save(
value="Go combines simplicity with high performance and native concurrency.",
metadata={"lang": "Go", "year": 2009, "typing": "static"},
)
# Internally stores:
# meta: {"lang": "Go", "year": 2009, "typing": "static", "value": "<the text>"}
# filter: {"lang": "Go", "year": 2009, "typing": "static"}In hybrid mode, save() also computes sparse vectors via the configured sparse encoder and includes sparse_indices/sparse_values in the upsert.
Dense Search
search(query, limit) embeds the query, queries the Endee index, and returns CrewAI-compatible dicts with id, content, metadata, and score.
results = store.search("concurrency", limit=3)
# Each result: {"id": "abc123", "content": "Go combines...", "metadata": {...}, "score": 0.87}When CrewAI calls search() internally before a task, it passes a score_threshold (default 0.6) to filter out low-relevance results.
Hybrid Mode (Dense + BM25)
Add sparse_model_name to enable hybrid search — dense similarity + BM25 keyword matching fused via RRF.
hybrid_store = EndeeVectorStore(
type="demo_hybrid",
embedder_config=embedder_config,
api_token=ENDEE_API_TOKEN,
sparse_model_name="endee/bm25",
)
# Favour dense similarity
results = hybrid_store.search("Go microservices", limit=3, dense_rrf_weight=0.8)See Search for RRF tuning (dense_rrf_weight, rrf_rank_constant).
Why hybrid matters for agents: Pure semantic search can miss memories containing specific terms (error codes, function names, exact phrases). BM25 ensures keyword matches surface even when semantic meaning is tangential.
Search with Metadata Filters
Filters narrow search to specific metadata values. See Filtering for supported operators ($eq, $in, $range).
# Only statically typed languages
results = store.search(
"high performance language",
limit=3,
filter=[{"typing": {"$eq": "static"}}],
)Metadata fields set during save() are automatically made filterable — Endee applies them during HNSW graph traversal, not as post-filtering.
Index Operations
These methods call the Endee SDK directly:
| Method | What it does |
|---|---|
store.describe() | Index metadata (count, dimension, precision) |
store.get_vector(id) | Full vector data including meta/filter |
store.update_filters(updates) | Update filter metadata without re-embedding |
store.delete_vector(id) | Remove a single vector |
store.delete(filter) | Bulk delete by filter |
store.reset() | Delete the entire index |
info = store.describe()
store.update_filters([{"id": vec_id, "filter": {"lang": "Rust", "status": "reviewed"}}])
store.delete_vector(vec_id)Supported Sparse Models
from crewai_endee import list_supported_models
for name, config in list_supported_models().items():
print(f" {name} -- {config['description']}")| Model name | Encoder | Install |
|---|---|---|
"endee/bm25" | BM25 via endee_model | included |
"splade_pp" | SPLADE++ via fastembed | pip install fastembed |
Multi-Agent Crew with Endee Memory
Wire EndeeVectorStore into a Crew by passing it to ShortTermMemory and EntityMemory. This bypasses ChromaDB entirely — agents read/write to Endee indexes instead.
from crewai import LLM, Agent, Crew, Process, Task
from crewai.memory.short_term.short_term_memory import ShortTermMemory
from crewai.memory.entity.entity_memory import EntityMemory
stm_store = EndeeVectorStore(type="crew_short_term", embedder_config=embedder_config, api_token=ENDEE_API_TOKEN)
entity_store = EndeeVectorStore(type="crew_entity", embedder_config=embedder_config, api_token=ENDEE_API_TOKEN)
short_term_memory = ShortTermMemory(storage=stm_store)
entity_memory = EntityMemory(storage=entity_store)
crew = Crew(
agents=[...],
tasks=[...],
memory=True, # enables the memory system
short_term_memory=short_term_memory,
entity_memory=entity_memory,
embedder=embedder_config, # used by CrewAI for other memory operations
verbose=True,
)
result = crew.kickoff()memory=True is required — without it, short_term_memory and entity_memory are ignored. embedder=embedder_config prevents CrewAI from falling back to OpenAI for its own internal operations.
Execution Flow
When crew.kickoff() runs:
- Task starts — CrewAI calls
short_term_memory.search(query=<task description>)→EndeeVectorStore.search()→ Endee HNSW index - Agent reasons — processes the task with any recalled context
- Task completes — CrewAI calls
short_term_memory.save(value=<agent output>, metadata={...})→EndeeVectorStore.save()→ upserted into Endee - Entity extraction —
entity_memory.save()stores extracted entities - Next task — repeats from step 1, with all previous outputs now searchable
Summary
| What | How |
|---|---|
| Connect | EndeeVectorStore(type=..., embedder_config=..., api_token=...) |
| Insert | store.save(value, metadata) |
| Search | store.search(query, limit) |
| Hybrid | Add sparse_model_name="endee/bm25" |
| Filters | store.search(query, filter=[{"key": {"$eq": "val"}}]) |
| CrewAI memory | ShortTermMemory(storage=store) → Crew(short_term_memory=...) |