Filtered Vector Search: Two Simple Parameters
The Basic Idea
When you add a filter to a vector search, Endee picks a strategy:
- Scan small sets directly - If only 500 documents match, check all 500 one by one
- Navigate the graph - If 100,000 documents match, use the search graph to find results fast
Which strategy to use depends on how many documents your filter matches.
Strategy 1: Direct Scan (Exact, Small Sets)
When your filter matches a small number of documents, Endee scans them all.
Your filter → Find all matching documents (e.g., 500)
→ Score all 500 with your query
→ Return top 10Pros:
- Guaranteed exact results
- Fast for small sets
Cons:
- Slow if you have thousands of matches
Example: A jewelry store filters for “titanium wedding bands” and finds 150 matches. Endee scores all 150.
Strategy 2: Graph Navigation (Fast, Large Sets)
When your filter matches many documents, Endee navigates the search graph.
Your filter → Use search graph to navigate
→ Skip non-matching documents
→ Collect top resultsPros:
- Very fast, even with 100,000+ matches
Cons:
- Results may be approximate (might miss some good matches)
Example: Filter for “English language articles” and find 6 million matches. Endee uses the graph to find relevant results quickly.
The Two Parameters
1. prefilter_cardinality_threshold
This is the switch point between direct scan and graph navigation.
Default: 10,000
results = index.query(
vector=query_vector,
filter=[{"category": {"$eq": "health"}}],
prefilter_cardinality_threshold=10_000, # the default
)How it works:
- Matches < 10,000 → Use direct scan (exact results)
- Matches ≥ 10,000 → Use graph navigation (fast)
When to change it:
- Increase it (e.g., to 50,000) if you want more exact results even with more matches
- Decrease it (e.g., to 5,000) if you want to use the fast graph more often
# Want exact results? Keep direct scan for more matches
index.query(
vector=query_vector,
filter=[{"category": {"$eq": "health"}}],
prefilter_cardinality_threshold=50_000, # direct scan up to 50k
)
# Need speed? Switch to graph sooner
index.query(
vector=query_vector,
filter=[{"category": {"$eq": "health"}}],
prefilter_cardinality_threshold=5_000, # graph navigation kicks in sooner
)2. filter_boost_percentage
This controls how hard Endee searches when using the graph.
Default: 0 (no boost)
results = index.query(
vector=query_vector,
filter=[{"category": {"$eq": "health"}}],
filter_boost_percentage=0, # the default
)The problem it solves:
Sometimes the graph doesn’t find enough results. Example: You ask for 10 results, but only get 3. This happens when your filter is very selective.
The solution:
Increase filter_boost_percentage to make Endee explore more.
# Getting fewer results than expected?
index.query(
vector=query_vector,
filter=[{"category": {"$eq": "rare_disease"}}],
filter_boost_percentage=30, # search 30% harder
)Range: 0 to 100. Higher = more exploration, slower but better recall.
Simple Decision Guide
Start here:
- Run your query without tuning parameters
- Count the results you get
- Use this guide:
| Symptom | What to do |
|---|---|
| Getting fewer results than you asked for | Increase filter_boost_percentage (try 25–40) |
| Results are too slow | Decrease prefilter_cardinality_threshold (try 5,000) |
| Missing results you know exist | Increase prefilter_cardinality_threshold (try 50,000) |
| Everything looks good | Leave defaults alone |
Real-World Examples
Example 1: Multi-Tenant App
# 500,000 documents across 1,000 companies
# Filter: org_id = "acme" → 500 matches
results = index.query(
vector=query_vector,
filter=[{"org_id": {"$eq": "acme"}}],
# 500 < 10,000 → uses direct scan automatically
# Result: exact, fast
)No tuning needed. Direct scan is perfect for 500 documents.
Example 2: E-Commerce with Many Products
# 3 million products
# Filter: category = "headphones" AND price < $300 AND rating >= 4.0
# → 8,000 matches
results = index.query(
vector=model.encode("noise cancelling over ear").tolist(),
top_k=20,
filter=[
{"category": {"$eq": "headphones"}},
{"price_usd": {"$lt": 300}},
{"rating": {"$gte": 4.0}},
],
# 8,000 < 10,000 → uses direct scan
# Result: all 8,000 products ranked perfectly
)Direct scan is the right choice here too.
Example 3: Large News Site with Broad Filter
# 10 million articles
# Filter: published_date > "2024-01-01" → 3 million matches
# Want exact results on recall-critical content
results = index.query(
vector=model.encode("climate policy").tolist(),
top_k=10,
filter=[{"published_date": {"$gt": "2024-01-01"}}],
prefilter_cardinality_threshold=50_000,
# Force direct scan up to 50k
# 3 million > 50k, so graph is used, but we get good balance
)Tuning makes sense here because of the size.
Example 4: RAG Pipeline (LLM + Search)
# Document store: 2 million chunks
# Filter: company_id = "acme" AND doc_type = "contract" → 1,200 matches
# Missing a relevant contract = LLM gives incomplete answer
chunks = index.query(
vector=model.encode(user_question).tolist(),
top_k=5,
filter=[
{"company_id": {"$eq": "acme"}},
{"doc_type": {"$eq": "contract"}},
],
# 1,200 < 10,000 → direct scan runs
# Result: all relevant chunks found, LLM gets complete context
)The default threshold works perfectly. Direct scan on 1,200 documents is both fast and exact.
Code Patterns
# Pattern 1: Default (works for most cases)
index.query(vector=q, filter=[...])
# Pattern 2: Need exact results? Prefer direct scan
index.query(
vector=q,
filter=[...],
prefilter_cardinality_threshold=50_000
)
# Pattern 3: Not getting enough results?
index.query(
vector=q,
filter=[...],
filter_boost_percentage=30
)
# Pattern 4: Both adjustments
index.query(
vector=q,
filter=[...],
prefilter_cardinality_threshold=20_000,
filter_boost_percentage=25
)Key Takeaways
- Default threshold is 10,000 - switches between direct scan and graph navigation
- Direct scan = exact - all matching documents scored, guaranteed best results
- Graph navigation = fast - works well when many documents match
- Start with defaults - only tune if you see problems
prefilter_cardinality_threshold- control the switch pointfilter_boost_percentage- explore harder when getting too few results- For small match sets - nothing to tune, direct scan is already fast and exact
- For large match sets - experiment with parameters if speed or recall matters