Skip to Content
TutorialsFilter Tuning Guide

Filter tuning

Two parameters that control how Endee handles filtered search - when to check every document vs. navigate smartly, and how hard to search.

When you add a filter, Endee automatically picks one of two search approaches based on how many documents match. These two parameters let you fine-tune that behaviour.


How filtered search works

Endee uses two approaches depending on the filter’s selectivity:

Direct check (small match sets)

When your filter matches a small number of documents, Endee scores every single one against your query. Results are exact - nothing is missed.

Smart navigation (large match sets)

When your filter matches a large number of documents, Endee uses its search graph to navigate quickly to the best results. Handles millions of matches in milliseconds, but may occasionally miss a relevant result.

Endee switches between these two approaches automatically. The parameters below control exactly when and how it switches.


prefilter_cardinality_threshold

Controls when Endee switches from direct check to smart navigation.

Default: 10,000

If your filter matches fewer than 10,000 documents, Endee checks them all. If it matches 10,000 or more, it uses smart navigation.

results = collection.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=10_000, # the default )

When to change this value

SituationAction
Missing results you know exist (large filter)Raise to 50_000 - forces more exact checking
Search feels slowLower to 5_000 - switches to fast navigation sooner
Everything is fineLeave the default
# Prioritize accuracy: keep direct checking for up to 50k matches collection.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=50_000, ) # Prioritize speed: switch to fast navigation sooner collection.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], prefilter_cardinality_threshold=5_000, )

filter_boost_percentage

Controls how hard Endee searches when using smart navigation. Useful when you’re asking for 10 results but only getting 3 or 4.

Default: 0 (no extra exploration)

results = collection.query( vector=query_vector, filter=[{"category": {"$eq": "health"}}], filter_boost_percentage=0, # the default )

With a very narrow filter (like a rare category), smart navigation might not explore enough paths to find all requested results. filter_boost_percentage tells Endee to explore more paths.

# Getting fewer results than expected? collection.query( vector=query_vector, filter=[{"category": {"$eq": "rare_disease"}}], filter_boost_percentage=30, # explore 30% more paths )

Range: 0 to 100. Higher means more thorough and slightly slower.


Quick decision guide

SymptomFix
Getting fewer results than you asked forRaise filter_boost_percentage to 25-40
Search is slower than expectedLower prefilter_cardinality_threshold to 5,000
A result you know exists isn’t showing upRaise prefilter_cardinality_threshold to 50,000
Everything looks rightLeave the defaults

Real-world examples

Multi-tenant app (small per-customer data)

500,000 total documents across 1,000 companies. Filter on org_id = "acme" matches 500 documents.

results = collection.query( vector=query_vector, filter=[{"org_id": {"$eq": "acme"}}], # 500 < 10,000 → Endee checks all 500 directly # Result: exact and fast - no tuning needed )

E-commerce with layered filters

3 million products. Filter on category, price range, and rating matches 8,000 documents.

results = collection.query( vector=model.encode("noise cancelling over ear").tolist(), top_k=20, filter=[ {"category": {"$eq": "headphones"}}, {"price_usd": {"$range": [0, 300]}}, {"rating": {"$range": [4.0, 5.0]}}, ], # 8,000 < 10,000 → Endee checks all 8,000 directly )

Large news site

10 million articles. Filter on published date matches 3 million documents.

results = collection.query( vector=model.encode("climate policy").tolist(), top_k=10, filter=[{"published_date": {"$range": ["2024-01-01", "2026-12-31"]}}], prefilter_cardinality_threshold=50_000, # 3 million > 50k, so smart navigation is used # Raising the threshold means Endee evaluated more candidates before switching )

RAG pipeline (filtered retrieval)

2 million document chunks. Filter on company and doc type matches 1,200 documents.

chunks = collection.query( vector=model.encode(user_question).tolist(), top_k=5, filter=[ {"company_id": {"$eq": "acme"}}, {"doc_type": {"$eq": "contract"}}, ], # 1,200 < 10,000 → Endee checks all 1,200 directly )

Copy-paste patterns

# Default - works for most cases collection.query(vector=q, filter=[...]) # Want exact results, even with a large filter? collection.query( vector=q, filter=[...], prefilter_cardinality_threshold=50_000, ) # Not getting enough results back? collection.query( vector=q, filter=[...], filter_boost_percentage=30, ) # Both - more exact matching AND more exploration collection.query( vector=q, filter=[...], prefilter_cardinality_threshold=20_000, filter_boost_percentage=25, )

Key takeaways

  • You don’t need to tune anything for most cases. Defaults handle the vast majority of workloads.
  • prefilter_cardinality_threshold controls when Endee switches from exhaustive checking to fast navigation. Default: 10,000.
  • filter_boost_percentage controls how hard Endee searches when navigating. Use it when you’re not getting enough results back.
  • Small match sets (under ~10k): exact, fast, nothing to change.
  • Large match sets (over ~10k): experiment if speed or completeness matters to you.