Skip to Content
TutorialsSearch with Filters

Filtered Dense Search: $eq, $in, and $range in Practice

Time: 15–20 minLevel: BeginnerOpen In Colab

In this tutorial, you will:

  • Combine semantic vector search with server-side metadata filters to build precise, production-ready retrieval pipelines.
  • Run 15 filter configurations using all three of Endee’s operators: $eq, $in, and $range.
  • Understand the upsert contract — why you must declare filterable fields at write time, not query time.

Prerequisites: Endee running locally on http://127.0.0.1:8080


The Three Operators

Endee supports three server-side filter operators that cover almost every real-world use case:

OperatorWhat it doesExample
$eqExact match — string, bool, or int{"category": {"$eq": "tech"}}
$inList membership — OR within a single field{"category": {"$in": ["health", "science"]}}
$rangeInclusive numeric range [start, end]{"rating": {"$range": [4.0, 5.0]}}

All three are evaluated server-side before any vector computation begins. You never need to over-fetch and post-filter in Python.


The Upsert Contract

The most important rule: you must declare every filterable field at upsert time.

index.upsert([{ "id": "doc_01", "vector": dense_model.encode(text).tolist(), "meta": {"title": "Neural Nets & Vision", "category": "tech", ...}, # stored, not indexed for filtering "filter": { "category": "tech", # declared here → can be filtered later "year": 2023, "rating": 4.5, "author": "alice", "premium": True, }, }])

Fields in meta are returned with results but cannot be used in filters. Only fields declared in filter can be used in filter conditions at query time.


Install Dependencies

pip install endee sentence-transformers
from endee import Endee from sentence_transformers import SentenceTransformer print("Imports successful!")

Configure Your Index

INDEX_NAME = "dense_filter_demo" DENSE_MODEL_NAME = "all-MiniLM-L6-v2" DENSE_DIM = 384 SPACE_TYPE = "cosine"
ParameterValueWhy
INDEX_NAMEdense_filter_demoName of the Endee index
DENSE_MODEL_NAMEall-MiniLM-L6-v2384-dim embedding model
DENSE_DIM384Vector dimensionality
SPACE_TYPEcosineSimilarity metric

Define Your Document Corpus

You will work with 20 research articles across four categories. Each article has five metadata fields used as filter dimensions:

FieldTypeValues
categorystringtech, science, health, business
yearint20202024
ratingfloat3.54.9
authorstringalice, bob, carol, dave
premiumboolTrue / False

The full corpus of 20 documents is available in the Colab notebook — open it to see and run the complete dataset.

Load the Embedding Model

print(f"Loading {DENSE_MODEL_NAME} ...") dense_model = SentenceTransformer(DENSE_MODEL_NAME) print(f"Model loaded — output dim: {dense_model.get_sentence_embedding_dimension()}")

Create Your Endee Index

print("Connecting to Endee ...") client = Endee() print("Connected!") try: result = client.create_index( name=INDEX_NAME, dimension=DENSE_DIM, space_type=SPACE_TYPE, ) print(f"Index created: {result}") except Exception as e: print(f" {e} (index may already exist)") index = client.get_index(INDEX_NAME)

Embed and Index Your Documents

Encode each document and upsert it with two separate dicts:

  • meta — any data you want returned with results. Not indexed, not filterable.
  • filter — the fields you want to filter on at query time. A field absent here cannot be used in a filter later.
print("Embedding and upserting documents ...\n") payload = [] for doc in DOCUMENTS: vec = dense_model.encode(doc["text"]).tolist() m = doc["meta"] payload.append({ "id": doc["id"], "vector": vec, "meta": m, # All fields we may filter on must be declared here "filter": { "category": m["category"], "year": m["year"], "rating": m["rating"], "author": m["author"], "premium": m["premium"], }, }) result = index.upsert(payload) print(f"\nUpsert complete: {result}")

Set Up the Query

All 15 queries below use the same base text: “AI applications in healthcare and medicine”

This query has strong overlap with health documents and partial overlap with tech (ML, NLP), which makes filter effects easy to read in the output.

QUERY = "AI applications in healthcare and medicine" query_vec = dense_model.encode(QUERY).tolist() TOP_K = 5 def show_results(results, label=""): header = f" {'Rank':<5} {'ID':<10} {'Score':<8} {'Category':<12} {'Year':<6} {'Rating':<7} {'Author':<8} {'Premium':<8} Title" print(f"\n{'─'*len(header)}") if label: print(f" {label}") print(header) print(f" {'─'*120}") for i, r in enumerate(results, 1): m = r["meta"] print( f" {i:<5} {r['id']:<10} {r['similarity']:<8.4f} " f"{m.get('category',''):<12} {m.get('year',''):<6} " f"{m.get('rating',''):<7} {m.get('author',''):<8} " f"{str(m.get('premium','')):<8} {m.get('title','')}" ) print()

15 Filter Configurations

7.1 No Filter (Baseline)

Search across the entire corpus — every document is a candidate. This establishes the baseline ranking that all filtered queries below will diverge from.

results_all = index.query(vector=query_vec, top_k=TOP_K) show_results(results_all, label="No filter — all 20 documents are candidates")

7.2 $eq — Restrict to a Single Category

Use $eq to limit results to health articles only. Only documents whose category field equals "health" enter the candidate pool.

results_health = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$eq": "health"}}], ) show_results(results_health, label='Filter: category == "health" (4 candidates)')

7.3 $eq — Switch to a Different Category

Run the same query with category == "tech" to see how rankings change when the pool shifts.

results_tech = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$eq": "tech"}}], ) show_results(results_tech, label='Filter: category == "tech" (8 candidates)')

7.4 $eq — Filter by Author

Restrict to author == "bob" to see only Bob’s five articles. The embedding model ranks them by semantic relevance to the query — not by rating or category.

results_bob = index.query( vector=query_vec, top_k=TOP_K, filter=[{"author": {"$eq": "bob"}}], ) show_results(results_bob, label='Filter: author == "bob" (5 candidates)')

7.5 $eq — Boolean Flag (Premium Content)

Pass premium == True to surface only premium articles.

results_premium = index.query( vector=query_vec, top_k=TOP_K, filter=[{"premium": {"$eq": True}}], ) show_results(results_premium, label="Filter: premium == True (11 candidates)")

7.6 AND — Two $eq Conditions

Multiple filters in the list are always ANDed. Combining category == "health" AND premium == True reduces the candidate pool to only documents satisfying both conditions simultaneously.

results_health_premium = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}], ) show_results( results_health_premium, label='Filter: category == "health" AND premium == True' )

7.7 AND — Category + Author

Restrict to a specific author’s tech articles. Demonstrates that metadata filters compose freely across different field types.

results_alice_tech = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}], ) show_results( results_alice_tech, label='Filter: category == "tech" AND author == "alice"' )

7.8 $range — Numeric Year Window

$range takes a two-element array [start, end] (inclusive) and evaluates entirely server-side.

results_year_range = index.query( vector=query_vec, top_k=TOP_K, filter=[{"year": {"$range": [2022, 2024]}}], ) show_results(results_year_range, label="Filter: year in [2022, 2024] ($range)")

7.9 $range — Rating Quality Floor

Use rating ∈ [4.0, 5.0] to enforce a minimum quality bar before ranking begins.

results_high_rated = index.query( vector=query_vec, top_k=TOP_K, filter=[{"rating": {"$range": [4.0, 5.0]}}], ) show_results(results_high_rated, label="Filter: rating in [4.0, 5.0] ($range)")

7.10 $in — OR Within a Field (Multiple Categories)

$in matches documents where the field value is any of the listed values.

results_health_science = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$in": ["health", "science"]}}], ) show_results(results_health_science, label='Filter: category in ["health", "science"] ($in)')

7.11 $in — Multiple Authors

results_alice_bob = index.query( vector=query_vec, top_k=TOP_K, filter=[{"author": {"$in": ["alice", "bob"]}}], ) show_results(results_alice_bob, label='Filter: author in ["alice", "bob"] ($in)')

7.12 $in — Specific Year Cohorts (Numeric)

$in on numeric fields selects exact discrete values. Unlike $range [2022, 2024] which includes 2023, $in [2022, 2024] skips it entirely.

results_selected_years = index.query( vector=query_vec, top_k=TOP_K, filter=[{"year": {"$in": [2022, 2024]}}], ) show_results(results_selected_years, label="Filter: year in [2022, 2024] ($in numeric)")

7.13 AND: $eq + $range + $range

Combine an exact match with two numeric ranges across three fields: tech articles from 2022–2024 with rating ≥ 4.3.

results_tech_recent_quality = index.query( vector=query_vec, top_k=TOP_K, filter=[ {"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.3, 5.0]}}, ], ) show_results( results_tech_recent_quality, label='category=="tech" AND year in [2022,2024] AND rating in [4.3,5.0]' )

7.14 Three-way AND: $in + $range + $eq

Mix all three operators: health-or-science ($in), published 2021–2023 ($range), and premium only ($eq).

results_stem_premium = index.query( vector=query_vec, top_k=TOP_K, filter=[ {"category": {"$in": ["health", "science"]}}, {"year": {"$range": [2021, 2023]}}, {"premium": {"$eq": True}}, ], ) show_results( results_stem_premium, label='category in ["health","science"] AND year in [2021,2023] AND premium==True' )

7.15 $range — Tight Window (Top-tier Articles Only)

A narrow rating ∈ [4.5, 5.0] window acts as a precision gate.

results_top_tier = index.query( vector=query_vec, top_k=TOP_K, filter=[{"rating": {"$range": [4.5, 5.0]}}], ) show_results(results_top_tier, label="Filter: rating in [4.5, 5.0] (6 candidates)")

Summary

You ran 15 real queries against the same index and saw exactly how $eq, $in, and $range change the candidate pool before ranking begins.

Key things to remember:

  • Declare every filterable field in the filter dict at upsert time — you cannot query a field that was not declared.
  • $eq for exact match, $in for OR within a field, $range for inclusive numeric ranges. All evaluated server-side.
  • Multiple filters in the list are always ANDed.
  • Filters are eligibility gates, not ranking signals. A highly rated but irrelevant document that passes the filter still ranks at the bottom.

Under the Hood: How Filters Execute

Endee’s filter system uses a pre-filter + adaptive search strategy.

Execution pipeline

1. Filter analysis └─ Each condition estimates its cardinality (how many IDs match) 2. Cheapest-first execution └─ Conditions evaluated in ascending cardinality order └─ Stops early if intermediate result becomes empty 3. Build a Roaring bitmap of matching document IDs 4. Adaptive vector search ├─ < 1,000 matching IDs → bypass HNSW, compute exact distances (brute force) └─ ≥ 1,000 matching IDs → pass bitmap to HNSW's searchKnn via filter functor

Storage per operator

Filter typeStorageLookup
$eq on string / boolInverted indexDirect key lookup
$in on stringSame inverted indexMultiple key lookups → bitmap union
$eq on numberHybrid bucket (B+ Tree)Point range query
$in on numberHybrid bucketOne point query per value → bitmap union
$range on numberHybrid bucketCursor scan with fast path for interior buckets

A Note on Score Values

You may notice that the numeric score for the same document differs between filtered and unfiltered results. This is expected behaviour, not a bug.

Endee uses two different search paths depending on how many documents pass the filter:

Search pathWhen usedScore meaning
HNSWFilter pool ≥ 1,000 IDsCosine similarity — higher is better
Brute-forceFilter pool < 1,000 IDsCosine distance — lower is better

In both cases the rank ordering is correct. The numeric values just live on opposite scales depending on which path was taken.


Real-World Patterns

Use caseFilter pattern
Multi-tenancy{"tenant_id": {"$eq": user.org_id}} — scope every query to the calling org
Content tiers{"premium": {"$eq": True}} — free and paid users get different candidate pools
Recency gates{"year": {"$range": [current_year - 1, current_year]}} — surface fresh content
Quality floors{"rating": {"$range": [4.0, 5.0]}} — enforce a minimum quality bar
Multi-topic view{"category": {"$in": ["health", "science"]}} — one index serves multiple tabs
Team dashboards{"author": {"$in": team_member_ids}} — show content from a specific team
Fiscal year cohorts{"year": {"$in": [2022, 2023]}} — compare specific years

Operator Reference

# $eq — exact match (string, bool, int) filter=[{"category": {"$eq": "tech"}}] filter=[{"premium": {"$eq": True}}] filter=[{"year": {"$eq": 2023}}] # $in — list membership / OR within field (string or numeric) filter=[{"category": {"$in": ["health", "science"]}}] filter=[{"author": {"$in": ["alice", "bob", "carol"]}}] filter=[{"year": {"$in": [2022, 2023, 2024]}}] # $range — inclusive numeric range [start, end] (int or float) filter=[{"year": {"$range": [2022, 2024]}}] filter=[{"rating": {"$range": [4.0, 5.0]}}] # AND — multiple filters in the list (all must match) filter=[ {"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.5, 5.0]}}, {"premium": {"$eq": True}}, ]

All operators are evaluated server-side before vector ranking. No top_k=len(DOCUMENTS) + Python post-filter needed.


Key Takeaways

  • $eq, $in, $range cover almost every real-world use case. Exact match for categories and flags, list membership for multi-value OR, numeric range for dates, scores, prices, and ages.
  • All three operators are server-side. Endee evaluates them before any vector computation begins.
  • AND is the only multi-filter logic. OR across different fields requires separate queries and client-side merging; OR within a field is what $in is for.
  • Declare everything at upsert time. Fields missing from the filter dict at upsert cannot be queried later.
  • Filters are eligibility gates, not ranking signals. A highly-rated irrelevant document that passes the filter will still rank at the bottom.

All results from Endee local mode, all-MiniLM-L6-v2 (384-dim cosine), 20-document corpus with 5 metadata fields.