Hybrid Search with BM25 and Dense Vectors
Combine keyword matching (BM25) with semantic search for better retrieval - using the SciFact dataset.
Overview
In this notebook, you will:
- Load the SciFact dataset - approximately 5,000 scientific article abstracts and 1,100 search queries
- Create BM25 sparse embeddings for both documents and queries
- Build a hybrid Endee index that stores both keyword and semantic vectors
- Run hybrid queries that combine both signals and compare results
What is BM25? A classic keyword ranking algorithm. It scores documents by how often your search terms appear - and down-weights common words like “the” in favour of rare, specific ones.
What is hybrid search? Combining BM25 (exact keyword matches) with dense vectors (semantic meaning). BM25 catches exact term matches; dense vectors catch synonyms and paraphrases. Together they handle more queries well.
The hybrid query flow:
Query ──► [Dense Embed: all-MiniLM-L6-v2] ──► 384-dim vector ──┐
├──► Endee Hybrid ReRanking ──► Top-K
Query ──► [Sparse Embed: BM25] ──► sparse vector ──┘Prerequisites: Endee running locally on http://127.0.0.1:8080
Why Two Separate Embedding Functions?
BM25 treats documents and queries differently. A long article should be scored differently from a 6-word query - applying the same formula to both would unfairly penalize short queries.
| Function | Use it for | What it applies |
|---|---|---|
SparseModel.embed(documents) | Corpus / documents | Full BM25: word frequency x word rarity, adjusted for document length |
SparseModel.query_embed(query) | Search queries | Word rarity only - no length penalty |
Rule: Always use .embed() for documents and .query_embed() for queries. Mixing them produces incorrect BM25 scores.
Install
Required packages:
endee-model- provides theSparseModelclass that generates BM25 sparse vectorsendee- the client library to connect to the Endee vector databasesentence-transformers- provides the dense embedding modelall-MiniLM-L6-v2numpy==2.0.0- a specific numpy version required to avoid compatibility issues
pip install --upgrade endee-model endee sentence-transformers
pip install numpy==2.0.0Import
from getpass import getpass
from endee_model import SparseModel
from sentence_transformers import SentenceTransformer
from endee import EndeeAuthentication & Create Index
Before connecting, configure your API settings:
Three parameters are needed at index creation:
dimension=384- size of the dense vectors produced byall-MiniLM-L6-v2space_type="cosine"- how Endee measures similarity between dense vectorssparse_model- controls how Endee handles the sparse side
For sparse_model you have two options depending on which sparse model you use:
sparse_model="endee_bm25"- use this when your sparse vectors come fromendee/bm25. Endee holds the IDF weights on its server and applies them automatically, so you only need to send the TF weights from your client.sparse_model="default"- use this for SPLADE models or any other BM25 model. In this case Endee treats the values you send as final scores and does no further calculation. If you are using another BM25 model (notendee/bm25), you must compute the full IDF scores yourself on the client before sending them.
Connection options:
Choose your connection method: local server or serverless cloud.
Local Server: If your server has NDD_AUTH_TOKEN set, pass the same token when initializing:
client = Endee("ndd-auth-token")
client.set_base_url("http://0.0.0.0:8080/api/v1")Endee Serverless: Go to https://app.endee.io , create a token, then pass it here:
client = Endee("your-serverless-token")Then create the index:
INDEX_NAME = "example_hybrid"
# Delete if already exists so we start fresh
try:
client.delete_index(INDEX_NAME)
except Exception:
pass
client.create_index(
name=INDEX_NAME,
dimension=384,
space_type="cosine",
sparse_model="endee_bm25",
)
index = client.get_index(INDEX_NAME)
print(f"Index '{INDEX_NAME}' ready")Prepare Example Data
Create a simple corpus and set of test queries:
CORPUS = [
{
"id": "1",
"title": "Vitamin D and Cancer Risk",
"text": "Vitamin D supplementation has been shown to reduce the risk of certain cancers. Studies suggest that adequate vitamin D levels in the blood are associated with lower rates of colon and breast cancer."
},
...
]
QUERIES = [
{"id": "q1", "text": "does vitamin D help prevent cancer"},
...
]
print(f"{len(CORPUS)} documents, {len(QUERIES)} queries ready")Generate Sparse Embeddings
What a Sparse Embedding Looks Like
Both .embed() and .query_embed() return a SparseEmbedding with two arrays:
| Attribute | Type | Meaning |
|---|---|---|
.indices | ndarray[int] | Vocabulary token IDs with non-zero BM25 weight |
.values | ndarray[float] | BM25 weight for each token ID |
Only the tokens that actually appear in the text get non-zero entries - everything else is zero and omitted. This is what makes it sparse. A typical abstract produces about 90 non-zero tokens; a short query produces about 9.
Example sparse vector structure:
{
"id": "4983",
"sparse_vector": {
"indices": [412, 8901, 23445],
"values": [0.82, 1.41, 0.67]
},
"meta": {"text": "...", "title": "..."}
}Generate Sparse Embeddings
BM25 scores words differently for documents and queries. The library provides two separate methods:
.embed()for documents - looks at word frequency, word rarity, and adjusts for document length.query_embed()for queries - only looks at word rarity and skips the length adjustment so short queries are not penalized
Both methods return a sparse vector, which is just a list of words that appeared in the text along with their BM25 scores.
sparse_model = SparseModel(model_name="endee/bm25")
doc_texts = [doc["text"] for doc in CORPUS]
query_texts = [q["text"] for q in QUERIES]
# Documents use .embed() -- full BM25 with length normalisation
corpus_sparse = list(sparse_model.embed(doc_texts))
# Queries use .query_embed() -- IDF only, no length penalty
query_sparse = [next(sparse_model.query_embed(text)) for text in query_texts]
print(f"Sparse vectors generated -- {len(corpus_sparse)} docs, {len(query_sparse)} queries")
# Quick look at the first document's sparse vector
sv = corpus_sparse[0]
print(f"\nSample -- '{CORPUS[0]['title']}'")
print(f" non-zero tokens : {len(sv.indices)}")
print(f" top-5 indices : {sv.indices[:5].tolist()}")
print(f" top-5 values : {[round(v, 3) for v in sv.values[:5].tolist()]}")Generate Dense Embeddings
all-MiniLM-L6-v2 converts each document into a 384-number vector that captures its meaning. These are used alongside the BM25 sparse vectors so the search can match on meaning, not just exact words.
dense_model = SentenceTransformer("all-MiniLM-L6-v2")
corpus_dense = dense_model.encode(doc_texts)
print(f"Dense vectors generated -- shape: {corpus_dense.shape}")Index the Corpus
Each document needs both its dense and sparse vectors bundled together before it can be stored in Endee. Build a list called points where each item holds:
- The document id
- Its dense vector
- Its sparse vector (indices and values)
- The original title and text under meta so they show up in search results
Once the list is ready, index.upsert() sends all 6 documents to Endee in a single call and stores them in the index.
points = [
{
"id": doc["id"],
"vector": corpus_dense[i].tolist(),
"sparse_indices": corpus_sparse[i].indices.tolist(),
"sparse_values": corpus_sparse[i].values.tolist(),
"meta": {"title": doc["title"], "text": doc["text"]},
}
for i, doc in enumerate(CORPUS)
]
index.upsert(points)
print(f"{len(points)} documents indexed")Run Hybrid Queries
For each query:
- Convert the query text into a dense vector using the dense model
- Pick up its pre-computed sparse vector
- Pass both vectors to
index.query()together - Endee combines the BM25 and dense scores to find the best matches
top_k=3returns only the top 3 results
TOP_K = 3
for i, q in enumerate(QUERIES):
query_dense_vec = dense_model.encode(q["text"]).tolist()
sv = query_sparse[i]
hits = index.query(
vector=query_dense_vec,
sparse_indices=sv.indices.tolist(),
sparse_values=sv.values.tolist(),
top_k=TOP_K,
)
print(f"Query: {q['text']}")
for rank, h in enumerate(hits, 1):
print(f" {rank}. score={h['similarity']:.4f} {h['meta']['title']}\n")Cleanup
Delete the index.
client.delete_index(INDEX_NAME)
print(f"Deleted: {INDEX_NAME}")Key Takeaways
.embed()is for documents - full BM25 scoring with word frequency and length adjustment.query_embed()is for queries - simplified BM25 with no length penalty, so short queries get fair scores- Never swap them - mixing up the two functions produces wrong BM25 scores
- Sparse means most values are zero - only words that actually appear in the text get a score. Documents have more non-zero tokens than queries because they’re longer
- The points format is ready to index - load them directly into Endee with
index.upsert()