Skip to Content
TutorialsHybrid Search with BM25

Hybrid search with BM25

Open In Colab

Combine keyword matching (BM25) with semantic search for better retrieval.

Hybrid search merges two ranking signals: BM25 catches exact keyword matches, and dense vectors catch synonyms and paraphrases. Together they handle more queries well than either approach alone.


What you’ll build

In this notebook, you will:

  • Create BM25 sparse embeddings for both documents and queries
  • Build a hybrid Endee collection that stores both keyword and semantic vectors
  • Run hybrid queries that combine both signals and compare results

BM25 sparse model (endee-model)

endee-model is a Python library that generates BM25 sparse embeddings. It tokenizes text, removes stopwords, and computes BM25 term-frequency weights - producing a sparse vector where only the tokens that appear in the text have non-zero values.

Two methods, two purposes:

  • .embed() for documents - full BM25 with length normalization
  • .query_embed() for queries - IDF-only, no length penalty

Install with pip install endee-model and initialize with SparseModel(model_name="endee_bm25").

Why use endee_bm25?

Most BM25 implementations require building a full inverted collection over your corpus. endee_bm25 offloads this to the server:

  • No corpus needed at embed time - embed and collection documents one by one as they arrive
  • IDF stays accurate automatically - Endee updates IDF weights as your collection grows
  • Lighter client - you send compact TF-only vectors; IDF computation happens server-side
  • Drop-in for hybrid search - calibrated hybrid scores out of the box

The sparse_model parameter

For sparse_model you have two options depending on which sparse model you use:

  • sparse_model="endee_bm25" - use this when your vectors come from endee/bm25. Endee holds the IDF weights on its server and applies them automatically, so you only need to send the TF weights from your client.
  • sparse_model="default" - use this for SPLADE models or any other BM25 model. In this case Endee treats the values you send as final scores and does no further calculation. If you are using another BM25 model (not endee/bm25), you must compute the full IDF scores yourself on the client before sending them.

Installation

pip install --upgrade endee-model endee sentence-transformers pip install numpy==2.0.0

Imports

from endee_model import SparseModel from sentence_transformers import SentenceTransformer from endee import Endee

Authentication

Local Server

If NDD_AUTH_TOKEN is set on the server, pass the same token:

client = Endee("ndd-auth-token") client.set_base_url("http://0.0.0.0:8080/api/v1")

Endee Cloud

Create a token at app.endee.io :

client = Endee("your-serverless-token")

Creating a hybrid collection

Three parameters are needed:

  • dimension=384 - size of the dense vectors produced by all-MiniLM-L6-v2
  • space_type="cosine" - how Endee measures similarity between dense vectors
  • sparse_model="endee_bm25" - Endee handles IDF server-side so you only send TF weights
COLLECTION_NAME = "example_hybrid" client.create_collection( name=COLLECTION_NAME, dimension=384, space_type="cosine", sparse_model="endee_bm25", ) collection = client.get_collection(COLLECTION_NAME) print(f"Collection '{COLLECTION_NAME}' ready")

Preparing the dataset

CORPUS = [ { "id": "1", "title": "Vitamin D and Cancer Risk", "text": "Vitamin D supplementation has been shown to reduce the risk of certain cancers. Studies suggest that adequate vitamin D levels in the blood are associated with lower rates of colon and breast cancer." }, ... ] QUERIES = [ {"id": "q1", "text": "does vitamin D help prevent cancer"}, ... ] print(f"{len(CORPUS)} documents, {len(QUERIES)} queries ready")

Generating sparse embeddings

Both .embed() and .query_embed() return a SparseEmbedding with two arrays:

AttributeTypeMeaning
.indicesndarray[int]Vocabulary token IDs with non-zero BM25 weight
.valuesndarray[float]BM25 weight for each token ID

Only tokens that appear in the text get non-zero entries - everything else is omitted. A typical abstract produces about 90 non-zero tokens; a short query produces about 9.

sparse_model = SparseModel(model_name="endee_bm25") doc_texts = [doc["text"] for doc in CORPUS] query_texts = [q["text"] for q in QUERIES] # Documents use .embed() -- full BM25 with length normalisation corpus_sparse = list(sparse_model.embed(doc_texts)) # Queries use .query_embed() -- IDF only, no length penalty query_sparse = [next(sparse_model.query_embed(text)) for text in query_texts] print(f"Sparse vectors generated -- {len(corpus_sparse)} docs, {len(query_sparse)} queries") # Quick look at the first document's sparse vector sv = corpus_sparse[0] print(f"\nSample -- '{CORPUS[0]['title']}'") print(f" non-zero tokens : {len(sv.indices)}") print(f" top-5 indices : {sv.indices[:5].tolist()}") print(f" top-5 values : {[round(v, 3) for v in sv.values[:5].tolist()]}")

Never mix .embed() and .query_embed() - using the wrong function produces incorrect BM25 scores.


Generating dense embeddings

all-MiniLM-L6-v2 converts each document into a 384-dimensional vector that captures its meaning:

dense_model = SentenceTransformer("all-MiniLM-L6-v2") corpus_dense = dense_model.encode(doc_texts) print(f"Dense vectors generated -- shape: {corpus_dense.shape}")

Indexing documents

Bundle each document’s dense and sparse vectors together, then send them to Endee in a single call:

points = [ { "id": doc["id"], "vector": corpus_dense[i].tolist(), "sparse_indices": corpus_sparse[i].indices.tolist(), "sparse_values": corpus_sparse[i].values.tolist(), "meta": {"title": doc["title"], "text": doc["text"]}, } for i, doc in enumerate(CORPUS) ] collection.upsert(points) print(f"{len(points)} documents indexed")

Running hybrid queries

For each query, pass both the dense vector and sparse vector to collection.query(). Endee combines BM25 and dense scores to find the best matches.

TOP_K = 3 for i, q in enumerate(QUERIES): query_dense_vec = dense_model.encode(q["text"]).tolist() sv = query_sparse[i] hits = collection.query( vector=query_dense_vec, sparse_indices=sv.indices.tolist(), sparse_values=sv.values.tolist(), top_k=TOP_K, ) print(f"Query: {q['text']}") for rank, h in enumerate(hits, 1): print(f" {rank}. score={h['similarity']:.4f} {h['meta']['title']}\n")

Cleanup

client.delete_collection(COLLECTION_NAME) print(f"Deleted: {COLLECTION_NAME}")

Key takeaways

  • .embed() is for documents - full BM25 scoring with word frequency and length adjustment
  • .query_embed() is for queries - simplified BM25 with no length penalty, so short queries get fair scores
  • Never swap them - mixing up the two functions produces wrong BM25 scores
  • Sparse means most values are zero - only words that actually appear in the text get a score
  • The points format is ready to collection - load them directly into Endee with collection.upsert()