Filtered Vector Search: Two Simple Parameters

Time: 15 minLevel: Intermediate

The Basic Idea

When you add a filter to a vector search, Endee picks a strategy:

Scan small sets directly - If only 500 documents match, check all 500 one by one
Navigate the graph - If 100,000 documents match, use the search graph to find results fast

Which strategy to use depends on how many documents your filter matches.

Strategy 1: Direct Scan (Exact, Small Sets)

When your filter matches a small number of documents, Endee scans them all.


Your filter → Find all matching documents (e.g., 500)
           → Score all 500 with your query
           → Return top 10

Pros:

Guaranteed exact results
Fast for small sets

Cons:

Slow if you have thousands of matches

Example: A jewelry store filters for “titanium wedding bands” and finds 150 matches. Endee scores all 150.

When your filter matches many documents, Endee navigates the search graph.


Your filter → Use search graph to navigate
           → Skip non-matching documents
           → Collect top results

Pros:

Very fast, even with 100,000+ matches

Cons:

Results may be approximate (might miss some good matches)

Example: Filter for “English language articles” and find 6 million matches. Endee uses the graph to find relevant results quickly.

The Two Parameters

1. `prefilter_cardinality_threshold`

This is the switch point between direct scan and graph navigation.

Default: 10,000


results = index.query(
    vector=query_vector,
    filter=[{"category": {"$eq": "health"}}],
    prefilter_cardinality_threshold=10_000,  # the default
)

How it works:

Matches < 10,000 → Use direct scan (exact results)
Matches ≥ 10,000 → Use graph navigation (fast)

When to change it:

Increase it (e.g., to 50,000) if you want more exact results even with more matches
Decrease it (e.g., to 5,000) if you want to use the fast graph more often


# Want exact results? Keep direct scan for more matches
index.query(
    vector=query_vector,
    filter=[{"category": {"$eq": "health"}}],
    prefilter_cardinality_threshold=50_000,  # direct scan up to 50k
)
 
# Need speed? Switch to graph sooner
index.query(
    vector=query_vector,
    filter=[{"category": {"$eq": "health"}}],
    prefilter_cardinality_threshold=5_000,  # graph navigation kicks in sooner
)

2. `filter_boost_percentage`

This controls how hard Endee searches when using the graph.

Default: 0 (no boost)


results = index.query(
    vector=query_vector,
    filter=[{"category": {"$eq": "health"}}],
    filter_boost_percentage=0,  # the default
)

The problem it solves:

Sometimes the graph doesn’t find enough results. Example: You ask for 10 results, but only get 3. This happens when your filter is very selective.

The solution:

Increase filter_boost_percentage to make Endee explore more.


# Getting fewer results than expected?
index.query(
    vector=query_vector,
    filter=[{"category": {"$eq": "rare_disease"}}],
    filter_boost_percentage=30,  # search 30% harder
)

Range: 0 to 100. Higher = more exploration, slower but better recall.

Simple Decision Guide

Start here:

Run your query without tuning parameters
Count the results you get
Use this guide:

Symptom	What to do
Getting fewer results than you asked for	Increase `filter_boost_percentage` (try 25–40)
Results are too slow	Decrease `prefilter_cardinality_threshold` (try 5,000)
Missing results you know exist	Increase `prefilter_cardinality_threshold` (try 50,000)
Everything looks good	Leave defaults alone

Real-World Examples

Example 1: Multi-Tenant App


# 500,000 documents across 1,000 companies
# Filter: org_id = "acme" → 500 matches
 
results = index.query(
    vector=query_vector,
    filter=[{"org_id": {"$eq": "acme"}}],
    # 500 < 10,000 → uses direct scan automatically
    # Result: exact, fast
)

No tuning needed. Direct scan is perfect for 500 documents.

Example 2: E-Commerce with Many Products


# 3 million products
# Filter: category = "headphones" AND price < $300 AND rating >= 4.0
# → 8,000 matches
 
results = index.query(
    vector=model.encode("noise cancelling over ear").tolist(),
    top_k=20,
    filter=[
        {"category": {"$eq": "headphones"}},
        {"price_usd": {"$lt": 300}},
        {"rating": {"$gte": 4.0}},
    ],
    # 8,000 < 10,000 → uses direct scan
    # Result: all 8,000 products ranked perfectly
)

Direct scan is the right choice here too.

Example 3: Large News Site with Broad Filter


# 10 million articles
# Filter: published_date > "2024-01-01" → 3 million matches
# Want exact results on recall-critical content
 
results = index.query(
    vector=model.encode("climate policy").tolist(),
    top_k=10,
    filter=[{"published_date": {"$gt": "2024-01-01"}}],
    prefilter_cardinality_threshold=50_000,
    # Force direct scan up to 50k
    # 3 million > 50k, so graph is used, but we get good balance
)

Tuning makes sense here because of the size.

Example 4: RAG Pipeline (LLM + Search)


# Document store: 2 million chunks
# Filter: company_id = "acme" AND doc_type = "contract" → 1,200 matches
# Missing a relevant contract = LLM gives incomplete answer
 
chunks = index.query(
    vector=model.encode(user_question).tolist(),
    top_k=5,
    filter=[
        {"company_id": {"$eq": "acme"}},
        {"doc_type": {"$eq": "contract"}},
    ],
    # 1,200 < 10,000 → direct scan runs
    # Result: all relevant chunks found, LLM gets complete context
)

The default threshold works perfectly. Direct scan on 1,200 documents is both fast and exact.

Code Patterns


# Pattern 1: Default (works for most cases)
index.query(vector=q, filter=[...])
 
# Pattern 2: Need exact results? Prefer direct scan
index.query(
    vector=q,
    filter=[...],
    prefilter_cardinality_threshold=50_000
)
 
# Pattern 3: Not getting enough results?
index.query(
    vector=q,
    filter=[...],
    filter_boost_percentage=30
)
 
# Pattern 4: Both adjustments
index.query(
    vector=q,
    filter=[...],
    prefilter_cardinality_threshold=20_000,
    filter_boost_percentage=25
)

Key Takeaways

Default threshold is 10,000 - switches between direct scan and graph navigation
Direct scan = exact - all matching documents scored, guaranteed best results
Graph navigation = fast - works well when many documents match
Start with defaults - only tune if you see problems
prefilter_cardinality_threshold - control the switch point
filter_boost_percentage - explore harder when getting too few results
For small match sets - nothing to tune, direct scan is already fast and exact
For large match sets - experiment with parameters if speed or recall matters