Hybrid search with BM25

Combine keyword matching (BM25) with semantic search for better retrieval.

Hybrid search merges two ranking signals: BM25 catches exact keyword matches, and dense vectors catch synonyms and paraphrases. Together they handle more queries well than either approach alone.

What you’ll build

In this notebook, you will:

Create BM25 sparse embeddings for both documents and queries
Build a hybrid Endee Collection that stores both keyword and semantic vectors
Run hybrid queries that combine both signals and compare results

BM25 sparse model (endee-model)

endee-model is a Python library that generates BM25 sparse embeddings. It tokenizes text, removes stopwords, and computes BM25 term-frequency weights, producing a sparse vector where only the tokens that appear in the text have non-zero values.

Two methods, two purposes:

.embed() for documents: full BM25 with length normalization
.query_embed() for queries: IDF-only, no length penalty

Install with pip install endee-model and initialize with SparseModel(model_name="endee_bm25").

Why use endee_bm25?

Most BM25 implementations require building a full inverted index over your corpus. endee_bm25 offloads this to the server:

No corpus needed at embed time: embed and index documents one by one as they arrive
IDF stays accurate automatically: Endee updates IDF weights as your collection grows
Lighter client: you send compact TF-only vectors; IDF computation happens server-side
Drop-in for hybrid search: calibrated hybrid scores out of the box

The sparse_model parameter

When creating a collection, sparse_model controls how sparse scores are handled:

sparse_model="endee_bm25": use with endee_bm25. Endee holds IDF weights server-side and multiplies them with the TF weights you send.
sparse_model="default": use for SPLADE or any other model. Endee treats the values you send as final scores.

Installation


pip install --upgrade endee-model endee sentence-transformers

Imports


from endee_model import SparseModel
from sentence_transformers import SentenceTransformer
from endee import Endee, rerank

Authentication

Create a token at app.endee.io and pass it to the client:


client = Endee("your-serverless-token")

Creating a hybrid Collection (Dense + Sparse)

Create a collection with two fields:

embedding: dense vector field (dimension 384, cosine similarity) for all-MiniLM-L6-v2 embeddings
keywords: a sparse field using endee_bm25, where Endee handles IDF server-side so you only send TF weights


COLLECTION_NAME = "example_hybrid"
 
client.create_collection(
    name=COLLECTION_NAME,
    fields=[
        {
            "name": "embedding",
            "type": "vector",
            "params": {"dimension": 384, "space_type": "cosine"},
        },
        {
            "name": "keywords",
            "type": "sparse",
            "sparse_model": "endee_bm25",
        },
    ],
)
collection = client.get_collection(COLLECTION_NAME)
print(f"Collection '{COLLECTION_NAME}' ready")

Preparing the dataset


CORPUS = [
    {
        "id": "1",
        "title": "Vitamin D and Cancer Risk",
        "text": "Vitamin D supplementation has been shown to reduce the risk of certain cancers. Studies suggest that adequate vitamin D levels in the blood are associated with lower rates of colon and breast cancer."
    },
    ...
]
 
QUERIES = [
    {"id": "q1", "text": "does vitamin D help prevent cancer"},
    ...
]
 
print(f"{len(CORPUS)} documents, {len(QUERIES)} queries ready")

Generating sparse embeddings

Both .embed() and .query_embed() return a SparseEmbedding with two arrays:

Attribute	Type	Meaning
`.indices`	`ndarray[int]`	Vocabulary token IDs with non-zero BM25 weight
`.values`	`ndarray[float]`	BM25 weight for each token ID

Only tokens that appear in the text get non-zero entries. A typical abstract produces about 90 non-zero tokens; a short query produces about 9.


sparse_model = SparseModel(model_name="endee_bm25")
 
doc_texts   = [doc["text"] for doc in CORPUS]
query_texts = [q["text"]   for q in QUERIES]
 
# Documents use .embed() -- full BM25 with length normalisation
corpus_sparse = list(sparse_model.embed(doc_texts))
 
# Queries use .query_embed() -- IDF only, no length penalty
query_sparse  = [next(sparse_model.query_embed(text)) for text in query_texts]
 
print(f"Sparse vectors generated -- {len(corpus_sparse)} docs, {len(query_sparse)} queries")
 
# Quick look at the first document's sparse vector
sv = corpus_sparse[0]
print(f"\nSample -- '{CORPUS[0]['title']}'")
print(f"  non-zero tokens : {len(sv.indices)}")
print(f"  top-5 indices   : {sv.indices[:5].tolist()}")
print(f"  top-5 values    : {[round(v, 3) for v in sv.values[:5].tolist()]}")

Never mix .embed() and .query_embed(). Using the wrong function produces incorrect BM25 scores.

Generating dense embeddings

all-MiniLM-L6-v2 converts each document into a 384-dimensional vector that captures its meaning:


dense_model = SentenceTransformer("all-MiniLM-L6-v2")
 
corpus_dense = dense_model.encode(doc_texts)
 
print(f"Dense vectors generated -- shape: {corpus_dense.shape}")

Indexing documents

Bundle each document’s dense and sparse vectors together under the fields key, then send them to Endee in a single call:


objects = [
    {
        "id":   doc["id"],
        "meta": {"title": doc["title"], "text": doc["text"]},
        "fields": {
            "embedding": corpus_dense[i].tolist(),
            "keywords": {
                "indices": corpus_sparse[i].indices.tolist(),
                "values":  corpus_sparse[i].values.tolist(),
            },
        },
    }
    for i, doc in enumerate(CORPUS)
]
 
collection.upsert(objects)
print(f"{len(objects)} documents indexed")

Running hybrid queries

For each query, search both the dense and sparse fields, then fuse the per-field results into a single ranked list using rerank(). This combines BM25 keyword matching with semantic similarity to find the best matches.


TOP_K = 3
 
for i, q in enumerate(QUERIES):
    query_dense_vec = dense_model.encode(q["text"]).tolist()
    sv = query_sparse[i]
 
    search_results = collection.search(
        fields={
            "embedding": {"query": query_dense_vec, "limit": TOP_K},
            "keywords":  {"query": {"indices": sv.indices.tolist(), "values": sv.values.tolist()}, "limit": TOP_K},
        },
    )
    hits = rerank(search_results, limit=TOP_K)["results"]
 
    print(f"Query: {q['text']}")
    for rank, h in enumerate(hits, 1):
        print(f"  {rank}. score={h['similarity']:.4f}  {h['meta']['title']}\n")

Cleanup


client.delete_collection(COLLECTION_NAME)
print(f"Deleted: {COLLECTION_NAME}")

Key takeaways

.embed() is for documents: full BM25 scoring with word frequency and length adjustment
.query_embed() is for queries: simplified BM25 with no length penalty, so short queries get fair scores
Never swap them: mixing up the two functions produces wrong BM25 scores
Sparse means most values are zero: only words that actually appear in the text get a score
Use the fields dict to bundle dense and sparse vectors together when upserting and searching
Use rerank() to fuse per-field search results into a single ranked list