How Endee Decides Which Search Strategy to Use
You added a filter to your vector search. The query runs. But have you ever wondered what actually happens inside? Does Endee walk the HNSW graph with the filter applied? Does it scan the matched documents directly? The answer is: it depends — and you control it.
Two parameters govern this decision: prefilter_cardinality_threshold and filter_boost_percentage. They are not quality knobs or cosmetic options. They change which search algorithm runs.
The Core Problem With Filtered Vector Search
HNSW is a graph. To find the nearest neighbors of a query, it starts at an entry point and greedily follows edges toward closer nodes, layer by layer, until it converges on the best candidates.
Filtering breaks this assumption.
Imagine your index has 100,000 documents and your filter matches only 200 of them. HNSW starts traversing the graph, finds a promising node — and it fails the filter. It follows another edge, finds another candidate — fails the filter again. The graph was built without any knowledge of your filter. The paths it follows lead to semantically close nodes, but most of those nodes are not in your filtered subset. HNSW keeps traversing, spending its exploration budget on nodes it has to discard, and may run out of budget before finding enough valid candidates.
What this looks like concretely: You ask for top_k=10. The HNSW budget allows visiting 100 nodes before it exits. Your filter matches 0.2% of the index, so on average 99 out of every 100 nodes visited are discarded. HNSW burns its entire budget on rejects and returns 1 result instead of 10.
The opposite problem exists too. If your filter matches 90,000 out of 100,000 documents, nearly every candidate HNSW encounters is valid. Running a brute-force scan over 90,000 vectors would be wasteful.
What that looks like concretely: You ask for top_k=10. HNSW visits 12 nodes, finds 10 valid matches immediately, and exits. Brute-force over 90,000 vectors would score every single one — 7,500× more work for the same answer.
The right strategy depends on how many documents match your filter. Endee makes this decision automatically — and lets you tune the threshold.
Strategy A: Brute-Force on the Filtered Subset
When the number of matching documents is small, Endee skips the HNSW graph entirely.
It extracts all matching IDs from the filter bitmap, fetches their vectors directly from storage, and runs an exhaustive scan — computing the distance from the query to every matching vector and returning the top results.
Filtered bitmap → [id_4, id_17, id_92, ... 200 IDs total]
↓
Fetch all 200 vectors
↓
Exhaustive distance computation
↓
Return top-k resultsWith a small matching set, this is both faster and exact. There is no graph overhead, no traversal, no edges to follow. Every matching vector is scored. The results are guaranteed to be the true nearest neighbors within the filtered subset.
Example — rare product category search:
A user searches for “titanium wedding band” on a jewelry store. The index has 500,000 products. The filter material = "titanium" matches 180 products. Endee extracts those 180 IDs, fetches their vectors, computes distances to the query vector, and returns the 10 closest. Every titanium product is compared. Nothing is missed.
Strategy B: Filtered HNSW Graph Search
When the matching set is large, HNSW’s logarithmic traversal advantage kicks back in.
Endee wraps the filter bitmap in a functor and passes it to searchKnn. During traversal, every candidate node is checked against the bitmap before being added to the result set. Non-matching nodes are skipped. The graph still guides the search toward semantically relevant regions — it just ignores nodes outside the filtered subset as it goes.
Query vector → Enter HNSW graph at entry point
↓
Follow edges toward closest candidates
(skip nodes not in the filter bitmap)
↓
Collect top-k valid candidatesExample — broad language filter:
A search engine indexes 10M articles in 20 languages. A user filters to English only: lang = "en". That matches 6M articles (60% of the index). HNSW traverses the graph — nearly every node it visits passes the filter. It finds 10 valid candidates after visiting roughly 15 nodes and exits. Running brute-force over 6M vectors would be hundreds of thousands of times slower for the same result.
prefilter_cardinality_threshold — The Switch
After computing the filter bitmap, Endee checks its size and picks a strategy:
cardinality < threshold → Strategy A (brute-force)
cardinality ≥ threshold → Strategy B (filtered HNSW)The default threshold is 10,000.
results = index.query(
vector=model.encode("AI in healthcare").tolist(),
top_k=10,
filter=[{"category": {"$eq": "rare_topic"}}],
prefilter_cardinality_threshold=10_000, # default
)If "rare_topic" matches 800 documents, Endee runs brute-force on those 800. If it matches 50,000 documents, Endee runs filtered HNSW across the graph.
Filter matches 800 → 800 < 10,000 → brute-force → all 800 scored, exact results
Filter matches 9,999 → 9,999 < 10,000 → brute-force → all 9,999 scored, exact results
Filter matches 10,000 → 10,000 ≥ 10,000 → HNSW → graph traversal, approximate
Filter matches 50,000 → 50,000 ≥ 10,000 → HNSW → graph traversal, fastThe threshold is the exact point where Endee switches from “score everything” to “navigate the graph”.
When to Change It
Lower the threshold when your index is large and your filters are typically broad. Lowering it keeps HNSW in play more often — at scale, graph traversal is faster than scanning tens of thousands of vectors directly.
Raise the threshold when your filters are very selective and you need exact results. Raising it means brute-force kicks in for larger match sets, which guarantees every matching vector is scored.
# Very selective filter on a 10M vector index — keep HNSW unless match set is tiny
results = index.query(
vector=query_vec,
filter=[{"tenant_id": {"$eq": org_id}}],
prefilter_cardinality_threshold=1_000,
)
# Small index, highly selective filter — prefer exact brute-force up to 50k matches
results = index.query(
vector=query_vec,
filter=[{"category": {"$eq": "health"}}],
prefilter_cardinality_threshold=50_000,
)Real-World Examples
E-commerce: broad brand filter
# Index: 2M products. Filter: brand = "Nike" → ~80,000 matches (4% of index).
# 80,000 > 10,000 → HNSW runs. Correct — brute-force over 80k would be slow.
results = index.query(
vector=model.encode("running shoes lightweight").tolist(),
top_k=10,
filter=[{"brand": {"$eq": "Nike"}}],
# default threshold=10_000 is fine here
)Multi-tenant SaaS: per-org filter
# Index: 500K documents across 1,000 orgs.
# Filter: org_id = "acme" → ~500 matches (0.1% of index).
# 500 < 10,000 → brute-force runs. Exact results on a tiny subset.
results = index.query(
vector=query_vec,
top_k=10,
filter=[{"org_id": {"$eq": "acme"}}],
# default works perfectly here
)Medical records: rare condition filter
# Index: 1M patient notes. Filter: diagnosis = "lupus" → ~2,000 matches.
# 2,000 < 10,000 → brute-force runs. Exact results guaranteed.
results = index.query(
vector=model.encode("joint inflammation fatigue").tolist(),
top_k=5,
filter=[{"diagnosis": {"$eq": "lupus"}}],
# default threshold=10_000 is correct — small set, brute-force is fast and exact
)RAG pipeline: per-tenant document scoping
# Index: 2M chunks across 500 companies.
# Filter: company_id = "acme" AND doc_type = "contract" → ~1,200 matches.
# Brute-force on 1,200 chunks is the right call — miss nothing.
chunks = index.query(
vector=model.encode(user_question).tolist(),
top_k=5,
filter=[
{"company_id": {"$eq": "acme"}},
{"doc_type": {"$eq": "contract"}},
],
# 1,200 < 10,000 → brute-force runs automatically
)
# Safer RAG config: force brute-force for up to 50k chunks per tenant
chunks = index.query(
vector=model.encode(user_question).tolist(),
top_k=5,
filter=[
{"company_id": {"$eq": company_id}},
{"doc_type": {"$in": ["contract", "policy", "sow"]}},
],
prefilter_cardinality_threshold=50_000, # brute-force for up to 50k — exact recall
)Valid range: 1,000 to 1,000,000.
filter_boost_percentage — Fixing the HNSW Attrition Problem
This parameter only matters for Strategy B. Here is the specific problem it solves.
Even when HNSW is the right strategy — because many documents match the filter — there is still an attrition problem. Say your filter matches 40% of your index. During graph traversal, HNSW checks each candidate against the bitmap and discards the 60% that fail. The internal exploration budget is consumed by both valid candidates and the rejected ones. If the budget runs out before enough valid candidates are found, the query returns fewer than top_k results.
filter_boost_percentage expands the exploration budget before the search begins:
new_budget = base_budget × (100 + filter_boost_percentage) / 100Setting filter_boost_percentage=25 means the graph explores 25% more nodes — spending that extra budget finding valid candidates that would otherwise have been missed.
How to tell if you need this parameter: check whether your filtered queries return fewer results than top_k.
results = index.query(
vector=query_vec,
top_k=10,
filter=[{"tier": {"$eq": "premium"}}],
)
# If this prints less than 10, the HNSW budget ran dry — you need filter_boost_percentage
print(len(results)) # → 6 ← problem: only 6 out of 10 requestedresults = index.query(
vector=model.encode("machine learning trends").tolist(),
top_k=10,
filter=[{"tier": {"$eq": "premium"}}],
filter_boost_percentage=25, # explore 25% more to compensate for filtered-out nodes
)| Value | What happens |
|---|---|
0 | Default — no boost, standard exploration budget |
20 | 20% more graph traversal before early exit |
50 | 50% more — useful for moderately selective filters |
100 | Double the exploration budget — maximum |
A useful starting point: estimate what percentage of index documents the filter discards, then set filter_boost_percentage to roughly half that number.
| Filter keeps | Filter discards | Suggested starting boost |
|---|---|---|
| 80% of index | 20% discarded | 10 |
| 50% of index | 50% discarded | 25 |
| 20% of index | 80% discarded | 40 |
| 10% of index | 90% discarded | 45 |
| 5% of index | 95% discarded | 50 (consider brute-force instead) |
Real-World Examples
Without boost — recall degradation:
# Index: 500K documents. Filter matches 40% = 200K docs.
# top_k=10, but HNSW only returns 6. Budget exhausted before finding 10 valid candidates.
results = index.query(
vector=query_vec,
top_k=10,
filter=[{"tier": {"$eq": "premium"}}],
filter_boost_percentage=0, # returns only 6 results — recall degradation
)With boost — fixed:
results = index.query(
vector=query_vec,
top_k=10,
filter=[{"tier": {"$eq": "premium"}}],
filter_boost_percentage=30, # explores 30% more nodes → finds all 10
)Compound filter — more candidates discarded, needs higher boost:
results = index.query(
vector=model.encode("mental health tips").tolist(),
top_k=10,
filter=[
{"category": {"$eq": "health"}},
{"year": {"$eq": 2024}},
{"lang": {"$eq": "en"}},
],
filter_boost_percentage=60, # compound filter — need more exploration budget
)Support ticket routing:
# Index: 300K support tickets. Filter: priority = "critical" → ~30K matches (10%).
# HNSW runs. But 90% of graph candidates get discarded → recall degrades without boost.
results = index.query(
vector=model.encode("payment gateway timeout checkout error").tolist(),
top_k=5,
filter=[{"priority": {"$eq": "critical"}}],
filter_boost_percentage=40, # 90% discard rate → aggressive boost needed
)Dialing in the right boost value:
# Scenario: index has 200K documents. Filter matches 20% = 40K.
# 80% of graph candidates get discarded.
# Too low — still missing results:
results = index.query(vector=q, top_k=10, filter=f, filter_boost_percentage=10)
# → returns 7 results (still degraded)
# About right:
results = index.query(vector=q, top_k=10, filter=f, filter_boost_percentage=35)
# → returns 10 results (full top_k recovered)
# Overkill — wastes compute, no quality gain:
results = index.query(vector=q, top_k=10, filter=f, filter_boost_percentage=100)
# → returns 10 results, but takes 2× longer than the 35 settingStart at 20–30, measure whether you get top_k results back, and increase only if you still see shortfalls. Going above 50 is rarely necessary unless your filter is extremely selective.
How They Work Together
The two parameters are independent:
prefilter_cardinality_thresholddecides which algorithm runsfilter_boost_percentageimproves how the HNSW algorithm performs when it runs
filter_boost_percentage has no effect when brute-force is active — brute-force already scans every matching vector, so there is nothing to boost.
results = index.query(
vector=model.encode("AI applications in healthcare").tolist(),
top_k=10,
filter=[
{"category": {"$eq": "health"}},
{"premium": {"$eq": True}},
],
prefilter_cardinality_threshold=5_000, # brute-force if fewer than 5k match
filter_boost_percentage=25, # expand graph budget for the HNSW path
)In this example: if the combined filter matches fewer than 5,000 documents, brute-force runs and filter_boost_percentage is ignored. If it matches more, HNSW runs with a 25% larger exploration budget.
Real Workload Walkthrough
A single product catalog index with 1M items. Different queries hit different paths:
index = client.get_index("product-catalog") # 1M vectors
# Query 1: Broad filter (500K match) — HNSW, no boost needed.
index.query(
vector=model.encode("wireless headphones").tolist(),
top_k=10,
filter=[{"in_stock": {"$eq": True}}],
)
# Query 2: Medium filter (25K match) — HNSW, moderate attrition.
index.query(
vector=model.encode("wireless headphones").tolist(),
top_k=10,
filter=[{"brand": {"$eq": "Sony"}}],
filter_boost_percentage=25,
)
# Query 3: Narrow compound filter (800 match) — brute-force, exact.
index.query(
vector=model.encode("wireless headphones").tolist(),
top_k=10,
filter=[
{"brand": {"$eq": "Sony"}},
{"category": {"$eq": "premium"}},
],
)
# Query 4: Very large index variant — push threshold down to keep HNSW active.
index.query(
vector=model.encode("wireless headphones").tolist(),
top_k=10,
filter=[{"brand": {"$eq": "Sony"}}],
prefilter_cardinality_threshold=1_000,
filter_boost_percentage=30,
)Choosing Parameters by Workload
Internal knowledge base
# Filter: team = "engineering" → ~3,000 docs out of 200K total.
# Small match set → brute-force by default → exact results. No tuning needed.
results = index.query(
vector=model.encode("how to deploy to staging").tolist(),
top_k=5,
filter=[{"team": {"$eq": "engineering"}}],
# 3,000 < 10,000 → brute-force runs automatically
)E-commerce search with facets
# Combined filter → ~8,000 matches.
results = index.query(
vector=model.encode("noise cancelling over ear headphones").tolist(),
top_k=20,
filter=[
{"category": {"$eq": "headphones"}},
{"price_usd": {"$lte": 300}},
{"rating": {"$gte": 4.0}},
],
prefilter_cardinality_threshold=10_000,
# 8,000 < 10,000 → brute-force runs → all 8,000 products scored exactly
)Real-time personalization feed
# Index: 20M articles. Filter: lang="en" AND topic IN ["tech","science"] → ~4M matches.
# Latency matters more than exhaustive recall.
results = index.query(
vector=user_interest_vector,
top_k=10,
filter=[
{"lang": {"$eq": "en"}},
{"topic": {"$in": ["tech", "science"]}},
],
prefilter_cardinality_threshold=5_000, # keep HNSW active — 4M match set is huge
filter_boost_percentage=15, # light boost; feed tolerates minor recall loss
)Fraud detection: high-stakes exact search
# Filter: merchant_category = "crypto_exchange" AND amount_usd > 10000 → ~500 matches.
# Recall must be perfect — a missed similar transaction is a missed fraud signal.
results = index.query(
vector=transaction_embedding,
top_k=10,
filter=[
{"merchant_category": {"$eq": "crypto_exchange"}},
{"amount_usd": {"$gt": 10000}},
],
prefilter_cardinality_threshold=100_000, # force brute-force — miss nothing
)The Default Behaviour
If you pass no tuning parameters:
results = index.query(
vector=query_vec,
top_k=10,
filter=[{"category": {"$eq": "health"}}],
)Endee uses prefilter_cardinality_threshold=10_000 and filter_boost_percentage=0. Filters matching fewer than 10,000 documents get an exact brute-force scan. Larger match sets go through filtered HNSW with no exploration boost.
For most workloads, these defaults are the right starting point. You only need to tune when you observe recall degradation on highly selective filters, or when you need to push latency lower on large indexes where brute-force is kicking in too often.
Quick Reference
Decision guide — start here:
1. How many documents does my filter match?
└─ Don't know? Run index.query(...) and check len(results) first.
2. Is len(results) < top_k?
└─ Yes → HNSW ran out of budget → add filter_boost_percentage (start at 25–40)
└─ No → recall is fine, no boost needed
3. Is latency too high?
└─ Yes, and filter matches many docs → lower prefilter_cardinality_threshold
(keep HNSW active for more queries)
└─ Yes, and filter matches few docs → nothing to do; brute-force on small sets is fast
4. Am I missing results I know should appear?
└─ Yes → raise prefilter_cardinality_threshold to force brute-force on your match setCode patterns:
# Default — works for most cases
index.query(vector=q, filter=[...])
# Large index, broad filters — keep HNSW in play more often
index.query(vector=q, filter=[...], prefilter_cardinality_threshold=1_000)
# Any index, selective filter — prefer exact brute-force
index.query(vector=q, filter=[...], prefilter_cardinality_threshold=100_000)
# HNSW path returning fewer than top_k — boost the exploration budget
index.query(vector=q, filter=[...], filter_boost_percentage=30)
# Compound filter — more discard → higher boost
index.query(vector=q, filter=[..., ..., ...], filter_boost_percentage=50)
# Combined — tune the switch point and the HNSW budget independently
index.query(
vector=q,
filter=[...],
prefilter_cardinality_threshold=5_000,
filter_boost_percentage=25,
)At-a-glance: which parameter does what
| Symptom | Parameter to adjust | Direction |
|---|---|---|
Queries returning fewer results than top_k | filter_boost_percentage | Increase |
| Brute-force too slow on large match sets | prefilter_cardinality_threshold | Decrease |
| Missing known relevant results | prefilter_cardinality_threshold | Increase |
| Compound filter degrading recall | filter_boost_percentage | Increase |
| Index is huge, filters are broad | prefilter_cardinality_threshold | Decrease |
prefilter_cardinality_threshold range: 1,000–1,000,000. filter_boost_percentage range: 0–100.