Skip to Content

CrewAI Integration with Endee

Open In Colab

CrewAI  is a framework for orchestrating role-playing AI agents that collaborate to complete tasks. By default, CrewAI stores agent memory in a local ChromaDB instance tied to an OpenAI embedder. EndeeVectorStore replaces that default with Endee — giving your agents persistent vector memory with local embeddings (no API key required), hybrid dense + BM25 search, and metadata filtering, without changing how you define agents or tasks.

How the Integration Works

CrewAI’s memory system has three layers:

ShortTermMemory / EntityMemory ← CrewAI memory classes (manage agent context) RAGStorage ← abstract storage interface (save + search) ChromaDB (default) ← local SQLite + OpenAI embeddings

EndeeVectorStore replaces RAGStorage. It extends the same base class but swaps ChromaDB for Endee:

ShortTermMemory / EntityMemory EndeeVectorStore ← our integration layer Endee (cloud or local) ← your embedder + Endee index

What this changes:

  • No API keys required — embeddings run locally via sentence-transformers
  • Persistent indexes that survive restarts (not local SQLite)
  • Hybrid search (dense + BM25) for better recall
  • Metadata filtering at the storage level

Install

pip install "crewai-endee==0.1.1b5" sentence-transformers==3.0.0 pip install numpy==2.0.0 # required for sentence-transformers compatibility

No API keys needed. Embeddings run locally via sentence-transformers.


Environment Setup

Only ENDEE_API_TOKEN is needed, and only for cloud mode. Omit it to run against a local Endee server.

import os ENDEE_API_TOKEN = os.getenv("ENDEE_API_TOKEN") # None -> local mode

Connect to Endee

See Quick Start for server setup details.

EndeeVectorStore extends CrewAI’s RAGStorage. On creation it:

  1. Calls build_embedder(embedder_config) to create the dense embedding function
  2. Optionally loads a sparse encoder if sparse_model_name is set
  3. Lazily creates the Endee index on first save() or search() call (or explicitly via ensure_index())

The vector dimension is auto-detected by embedding a test string on first use — you don’t need to specify it.

from crewai_endee import EndeeVectorStore embedder_config = { "provider": "sentence-transformer", "config": { "model_name": "all-MiniLM-L6-v2", "device": "cpu", }, } store = EndeeVectorStore( type="demo_dense", # becomes the Endee index name embedder_config=embedder_config, api_token=ENDEE_API_TOKEN, # None -> local Endee server space_type="cosine", precision="int8", ) store.ensure_index() # creates index now (otherwise lazy on first save/search)

Cloud vs local: If api_token is set, connects to Endee Cloud. If None, connects to a local server (default localhost:6070).


Insert Documents

save(value, metadata) does four things:

  1. Truncates the text to 8192 UTF-8 bytes
  2. Embeds the text using the configured embedder → dense vector
  3. Builds filterable fields — extracts all scalar (str/int/float) metadata values into a separate filter dict
  4. Upserts into the Endee index with a UUID as the vector ID
store.save( value="Go combines simplicity with high performance and native concurrency.", metadata={"lang": "Go", "year": 2009, "typing": "static"}, ) # Internally stores: # meta: {"lang": "Go", "year": 2009, "typing": "static", "value": "<the text>"} # filter: {"lang": "Go", "year": 2009, "typing": "static"}

In hybrid mode, save() also computes sparse vectors via the configured sparse encoder and includes sparse_indices/sparse_values in the upsert.


search(query, limit) embeds the query, queries the Endee index, and returns CrewAI-compatible dicts with id, content, metadata, and score.

results = store.search("concurrency", limit=3) # Each result: {"id": "abc123", "content": "Go combines...", "metadata": {...}, "score": 0.87}

When CrewAI calls search() internally before a task, it passes a score_threshold (default 0.6) to filter out low-relevance results.


Hybrid Mode (Dense + BM25)

Add sparse_model_name to enable hybrid search — dense similarity + BM25 keyword matching fused via RRF.

hybrid_store = EndeeVectorStore( type="demo_hybrid", embedder_config=embedder_config, api_token=ENDEE_API_TOKEN, sparse_model_name="endee/bm25", ) # Favour dense similarity results = hybrid_store.search("Go microservices", limit=3, dense_rrf_weight=0.8)

See Search for RRF tuning (dense_rrf_weight, rrf_rank_constant).

Why hybrid matters for agents: Pure semantic search can miss memories containing specific terms (error codes, function names, exact phrases). BM25 ensures keyword matches surface even when semantic meaning is tangential.


Search with Metadata Filters

Filters narrow search to specific metadata values. See Filtering for supported operators ($eq, $in, $range).

# Only statically typed languages results = store.search( "high performance language", limit=3, filter=[{"typing": {"$eq": "static"}}], )

Metadata fields set during save() are automatically made filterable — Endee applies them during HNSW graph traversal, not as post-filtering.


Index Operations

These methods call the Endee SDK directly:

MethodWhat it does
store.describe()Index metadata (count, dimension, precision)
store.get_vector(id)Full vector data including meta/filter
store.update_filters(updates)Update filter metadata without re-embedding
store.delete_vector(id)Remove a single vector
store.delete(filter)Bulk delete by filter
store.reset()Delete the entire index
info = store.describe() store.update_filters([{"id": vec_id, "filter": {"lang": "Rust", "status": "reviewed"}}]) store.delete_vector(vec_id)

Supported Sparse Models

from crewai_endee import list_supported_models for name, config in list_supported_models().items(): print(f" {name} -- {config['description']}")
Model nameEncoderInstall
"endee/bm25"BM25 via endee_modelincluded
"splade_pp"SPLADE++ via fastembedpip install fastembed

Multi-Agent Crew with Endee Memory

Wire EndeeVectorStore into a Crew by passing it to ShortTermMemory and EntityMemory. This bypasses ChromaDB entirely — agents read/write to Endee indexes instead.

from crewai import LLM, Agent, Crew, Process, Task from crewai.memory.short_term.short_term_memory import ShortTermMemory from crewai.memory.entity.entity_memory import EntityMemory stm_store = EndeeVectorStore(type="crew_short_term", embedder_config=embedder_config, api_token=ENDEE_API_TOKEN) entity_store = EndeeVectorStore(type="crew_entity", embedder_config=embedder_config, api_token=ENDEE_API_TOKEN) short_term_memory = ShortTermMemory(storage=stm_store) entity_memory = EntityMemory(storage=entity_store) crew = Crew( agents=[...], tasks=[...], memory=True, # enables the memory system short_term_memory=short_term_memory, entity_memory=entity_memory, embedder=embedder_config, # used by CrewAI for other memory operations verbose=True, ) result = crew.kickoff()

memory=True is required — without it, short_term_memory and entity_memory are ignored. embedder=embedder_config prevents CrewAI from falling back to OpenAI for its own internal operations.

Execution Flow

When crew.kickoff() runs:

  1. Task starts — CrewAI calls short_term_memory.search(query=<task description>)EndeeVectorStore.search() → Endee HNSW index
  2. Agent reasons — processes the task with any recalled context
  3. Task completes — CrewAI calls short_term_memory.save(value=<agent output>, metadata={...})EndeeVectorStore.save() → upserted into Endee
  4. Entity extractionentity_memory.save() stores extracted entities
  5. Next task — repeats from step 1, with all previous outputs now searchable

Summary

WhatHow
ConnectEndeeVectorStore(type=..., embedder_config=..., api_token=...)
Insertstore.save(value, metadata)
Searchstore.search(query, limit)
HybridAdd sparse_model_name="endee/bm25"
Filtersstore.search(query, filter=[{"key": {"$eq": "val"}}])
CrewAI memoryShortTermMemory(storage=store)Crew(short_term_memory=...)