Skip to Content
TutorialsFilter Tuning Guide

Filtered Vector Search: Two Simple Parameters

Time: 15 minLevel: Intermediate

The Basic Idea

When you add a filter to a vector search, Endee picks a strategy:

  1. Scan small sets directly - If only 500 documents match, check all 500 one by one
  2. Navigate the graph - If 100,000 documents match, use the search graph to find results fast

Which strategy to use depends on how many documents your filter matches.


Strategy 1: Direct Scan (Exact, Small Sets)

When your filter matches a small number of documents, Endee scans them all.

Your filter → Find all matching documents (e.g., 500) → Score all 500 with your query → Return top 10

Pros:

  • Guaranteed exact results
  • Fast for small sets

Cons:

  • Slow if you have thousands of matches

Example: A jewelry store filters for “titanium wedding bands” and finds 150 matches. Endee scores all 150.


Strategy 2: Graph Navigation (Fast, Large Sets)

When your filter matches many documents, Endee navigates the search graph.

Your filter → Use search graph to navigate → Skip non-matching documents → Collect top results

Pros:

  • Very fast, even with 100,000+ matches

Cons:

  • Results may be approximate (might miss some good matches)

Example: Filter for “English language articles” and find 6 million matches. Endee uses the graph to find relevant results quickly.


The Two Parameters

1. prefilter_cardinality_threshold

This is the switch point between direct scan and graph navigation.

Default: 10,000

results = index.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=10_000, # the default )

How it works:

  • Matches < 10,000 → Use direct scan (exact results)
  • Matches ≥ 10,000 → Use graph navigation (fast)

When to change it:

  • Increase it (e.g., to 50,000) if you want more exact results even with more matches
  • Decrease it (e.g., to 5,000) if you want to use the fast graph more often
# Want exact results? Keep direct scan for more matches index.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=50_000, # direct scan up to 50k ) # Need speed? Switch to graph sooner index.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=5_000, # graph navigation kicks in sooner )

2. filter_boost_percentage

This controls how hard Endee searches when using the graph.

Default: 0 (no boost)

results = index.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], filter_boost_percentage=0, # the default )

The problem it solves:

Sometimes the graph doesn’t find enough results. Example: You ask for 10 results, but only get 3. This happens when your filter is very selective.

The solution:

Increase filter_boost_percentage to make Endee explore more.

# Getting fewer results than expected? index.query( vector=query_vector, filter=[{"category": {"$eq": "rare_disease"}}], filter_boost_percentage=30, # search 30% harder )

Range: 0 to 100. Higher = more exploration, slower but better recall.


Simple Decision Guide

Start here:

  1. Run your query without tuning parameters
  2. Count the results you get
  3. Use this guide:
SymptomWhat to do
Getting fewer results than you asked forIncrease filter_boost_percentage (try 25–40)
Results are too slowDecrease prefilter_cardinality_threshold (try 5,000)
Missing results you know existIncrease prefilter_cardinality_threshold (try 50,000)
Everything looks goodLeave defaults alone

Real-World Examples

Example 1: Multi-Tenant App

# 500,000 documents across 1,000 companies # Filter: org_id = "acme" → 500 matches results = index.query( vector=query_vector, filter=[{"org_id": {"$eq": "acme"}}], # 500 < 10,000 → uses direct scan automatically # Result: exact, fast )

No tuning needed. Direct scan is perfect for 500 documents.

Example 2: E-Commerce with Many Products

# 3 million products # Filter: category = "headphones" AND price < $300 AND rating >= 4.0 # → 8,000 matches results = index.query( vector=model.encode("noise cancelling over ear").tolist(), top_k=20, filter=[ {"category": {"$eq": "headphones"}}, {"price_usd": {"$lt": 300}}, {"rating": {"$gte": 4.0}}, ], # 8,000 < 10,000 → uses direct scan # Result: all 8,000 products ranked perfectly )

Direct scan is the right choice here too.

Example 3: Large News Site with Broad Filter

# 10 million articles # Filter: published_date > "2024-01-01" → 3 million matches # Want exact results on recall-critical content results = index.query( vector=model.encode("climate policy").tolist(), top_k=10, filter=[{"published_date": {"$gt": "2024-01-01"}}], prefilter_cardinality_threshold=50_000, # Force direct scan up to 50k # 3 million > 50k, so graph is used, but we get good balance )

Tuning makes sense here because of the size.

# Document store: 2 million chunks # Filter: company_id = "acme" AND doc_type = "contract" → 1,200 matches # Missing a relevant contract = LLM gives incomplete answer chunks = index.query( vector=model.encode(user_question).tolist(), top_k=5, filter=[ {"company_id": {"$eq": "acme"}}, {"doc_type": {"$eq": "contract"}}, ], # 1,200 < 10,000 → direct scan runs # Result: all relevant chunks found, LLM gets complete context )

The default threshold works perfectly. Direct scan on 1,200 documents is both fast and exact.


Code Patterns

# Pattern 1: Default (works for most cases) index.query(vector=q, filter=[...]) # Pattern 2: Need exact results? Prefer direct scan index.query( vector=q, filter=[...], prefilter_cardinality_threshold=50_000 ) # Pattern 3: Not getting enough results? index.query( vector=q, filter=[...], filter_boost_percentage=30 ) # Pattern 4: Both adjustments index.query( vector=q, filter=[...], prefilter_cardinality_threshold=20_000, filter_boost_percentage=25 )

Key Takeaways

  • Default threshold is 10,000 - switches between direct scan and graph navigation
  • Direct scan = exact - all matching documents scored, guaranteed best results
  • Graph navigation = fast - works well when many documents match
  • Start with defaults - only tune if you see problems
  • prefilter_cardinality_threshold - control the switch point
  • filter_boost_percentage - explore harder when getting too few results
  • For small match sets - nothing to tune, direct scan is already fast and exact
  • For large match sets - experiment with parameters if speed or recall matters