Filtered Dense Search: $eq, $in, and $range in Practice
In this tutorial, you will:
- Combine semantic vector search with server-side metadata filters to build precise, production-ready retrieval pipelines.
- Run 15 filter configurations using all three of Endee’s operators:
$eq,$in, and$range. - Understand the upsert contract — why you must declare filterable fields at write time, not query time.
Prerequisites: Endee running locally on http://127.0.0.1:8080
The Three Operators
Endee supports three server-side filter operators that cover almost every real-world use case:
| Operator | What it does | Example |
|---|---|---|
$eq | Exact match — string, bool, or int | {"category": {"$eq": "tech"}} |
$in | List membership — OR within a single field | {"category": {"$in": ["health", "science"]}} |
$range | Inclusive numeric range [start, end] | {"rating": {"$range": [4.0, 5.0]}} |
All three are evaluated server-side before any vector computation begins. You never need to over-fetch and post-filter in Python.
The Upsert Contract
The most important rule: you must declare every filterable field at upsert time.
index.upsert([{
"id": "doc_01",
"vector": dense_model.encode(text).tolist(),
"meta": {"title": "Neural Nets & Vision", "category": "tech", ...}, # stored, not indexed for filtering
"filter": {
"category": "tech", # declared here → can be filtered later
"year": 2023,
"rating": 4.5,
"author": "alice",
"premium": True,
},
}])Fields in meta are returned with results but cannot be used in filters. Only fields declared in filter can be used in filter conditions at query time.
Install Dependencies
pip install endee sentence-transformersfrom endee import Endee
from sentence_transformers import SentenceTransformer
print("Imports successful!")Configure Your Index
INDEX_NAME = "dense_filter_demo"
DENSE_MODEL_NAME = "all-MiniLM-L6-v2"
DENSE_DIM = 384
SPACE_TYPE = "cosine"| Parameter | Value | Why |
|---|---|---|
INDEX_NAME | dense_filter_demo | Name of the Endee index |
DENSE_MODEL_NAME | all-MiniLM-L6-v2 | 384-dim embedding model |
DENSE_DIM | 384 | Vector dimensionality |
SPACE_TYPE | cosine | Similarity metric |
Define Your Document Corpus
You will work with 20 research articles across four categories. Each article has five metadata fields used as filter dimensions:
| Field | Type | Values |
|---|---|---|
category | string | tech, science, health, business |
year | int | 2020 – 2024 |
rating | float | 3.5 – 4.9 |
author | string | alice, bob, carol, dave |
premium | bool | True / False |
The full corpus of 20 documents is available in the Colab notebook — open it to see and run the complete dataset.
Load the Embedding Model
print(f"Loading {DENSE_MODEL_NAME} ...")
dense_model = SentenceTransformer(DENSE_MODEL_NAME)
print(f"Model loaded — output dim: {dense_model.get_sentence_embedding_dimension()}")Create Your Endee Index
print("Connecting to Endee ...")
client = Endee()
print("Connected!")
try:
result = client.create_index(
name=INDEX_NAME,
dimension=DENSE_DIM,
space_type=SPACE_TYPE,
)
print(f"Index created: {result}")
except Exception as e:
print(f" {e} (index may already exist)")
index = client.get_index(INDEX_NAME)Embed and Index Your Documents
Encode each document and upsert it with two separate dicts:
meta— any data you want returned with results. Not indexed, not filterable.filter— the fields you want to filter on at query time. A field absent here cannot be used in a filter later.
print("Embedding and upserting documents ...\n")
payload = []
for doc in DOCUMENTS:
vec = dense_model.encode(doc["text"]).tolist()
m = doc["meta"]
payload.append({
"id": doc["id"],
"vector": vec,
"meta": m,
# All fields we may filter on must be declared here
"filter": {
"category": m["category"],
"year": m["year"],
"rating": m["rating"],
"author": m["author"],
"premium": m["premium"],
},
})
result = index.upsert(payload)
print(f"\nUpsert complete: {result}")Set Up the Query
All 15 queries below use the same base text: “AI applications in healthcare and medicine”
This query has strong overlap with health documents and partial overlap with tech (ML, NLP), which makes filter effects easy to read in the output.
QUERY = "AI applications in healthcare and medicine"
query_vec = dense_model.encode(QUERY).tolist()
TOP_K = 5
def show_results(results, label=""):
header = f" {'Rank':<5} {'ID':<10} {'Score':<8} {'Category':<12} {'Year':<6} {'Rating':<7} {'Author':<8} {'Premium':<8} Title"
print(f"\n{'─'*len(header)}")
if label:
print(f" {label}")
print(header)
print(f" {'─'*120}")
for i, r in enumerate(results, 1):
m = r["meta"]
print(
f" {i:<5} {r['id']:<10} {r['similarity']:<8.4f} "
f"{m.get('category',''):<12} {m.get('year',''):<6} "
f"{m.get('rating',''):<7} {m.get('author',''):<8} "
f"{str(m.get('premium','')):<8} {m.get('title','')}"
)
print()15 Filter Configurations
7.1 No Filter (Baseline)
Search across the entire corpus — every document is a candidate. This establishes the baseline ranking that all filtered queries below will diverge from.
results_all = index.query(vector=query_vec, top_k=TOP_K)
show_results(results_all, label="No filter — all 20 documents are candidates")7.2 $eq — Restrict to a Single Category
Use $eq to limit results to health articles only. Only documents whose category field equals "health" enter the candidate pool.
results_health = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "health"}}],
)
show_results(results_health, label='Filter: category == "health" (4 candidates)')7.3 $eq — Switch to a Different Category
Run the same query with category == "tech" to see how rankings change when the pool shifts.
results_tech = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "tech"}}],
)
show_results(results_tech, label='Filter: category == "tech" (8 candidates)')7.4 $eq — Filter by Author
Restrict to author == "bob" to see only Bob’s five articles. The embedding model ranks them by semantic relevance to the query — not by rating or category.
results_bob = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"author": {"$eq": "bob"}}],
)
show_results(results_bob, label='Filter: author == "bob" (5 candidates)')7.5 $eq — Boolean Flag (Premium Content)
Pass premium == True to surface only premium articles.
results_premium = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"premium": {"$eq": True}}],
)
show_results(results_premium, label="Filter: premium == True (11 candidates)")7.6 AND — Two $eq Conditions
Multiple filters in the list are always ANDed. Combining category == "health" AND premium == True reduces the candidate pool to only documents satisfying both conditions simultaneously.
results_health_premium = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}],
)
show_results(
results_health_premium,
label='Filter: category == "health" AND premium == True'
)7.7 AND — Category + Author
Restrict to a specific author’s tech articles. Demonstrates that metadata filters compose freely across different field types.
results_alice_tech = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}],
)
show_results(
results_alice_tech,
label='Filter: category == "tech" AND author == "alice"'
)7.8 $range — Numeric Year Window
$range takes a two-element array [start, end] (inclusive) and evaluates entirely server-side.
results_year_range = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"year": {"$range": [2022, 2024]}}],
)
show_results(results_year_range, label="Filter: year in [2022, 2024] ($range)")7.9 $range — Rating Quality Floor
Use rating ∈ [4.0, 5.0] to enforce a minimum quality bar before ranking begins.
results_high_rated = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"rating": {"$range": [4.0, 5.0]}}],
)
show_results(results_high_rated, label="Filter: rating in [4.0, 5.0] ($range)")7.10 $in — OR Within a Field (Multiple Categories)
$in matches documents where the field value is any of the listed values.
results_health_science = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$in": ["health", "science"]}}],
)
show_results(results_health_science, label='Filter: category in ["health", "science"] ($in)')7.11 $in — Multiple Authors
results_alice_bob = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"author": {"$in": ["alice", "bob"]}}],
)
show_results(results_alice_bob, label='Filter: author in ["alice", "bob"] ($in)')7.12 $in — Specific Year Cohorts (Numeric)
$in on numeric fields selects exact discrete values. Unlike $range [2022, 2024] which includes 2023, $in [2022, 2024] skips it entirely.
results_selected_years = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"year": {"$in": [2022, 2024]}}],
)
show_results(results_selected_years, label="Filter: year in [2022, 2024] ($in numeric)")7.13 AND: $eq + $range + $range
Combine an exact match with two numeric ranges across three fields: tech articles from 2022–2024 with rating ≥ 4.3.
results_tech_recent_quality = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[
{"category": {"$eq": "tech"}},
{"year": {"$range": [2022, 2024]}},
{"rating": {"$range": [4.3, 5.0]}},
],
)
show_results(
results_tech_recent_quality,
label='category=="tech" AND year in [2022,2024] AND rating in [4.3,5.0]'
)7.14 Three-way AND: $in + $range + $eq
Mix all three operators: health-or-science ($in), published 2021–2023 ($range), and premium only ($eq).
results_stem_premium = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[
{"category": {"$in": ["health", "science"]}},
{"year": {"$range": [2021, 2023]}},
{"premium": {"$eq": True}},
],
)
show_results(
results_stem_premium,
label='category in ["health","science"] AND year in [2021,2023] AND premium==True'
)7.15 $range — Tight Window (Top-tier Articles Only)
A narrow rating ∈ [4.5, 5.0] window acts as a precision gate.
results_top_tier = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"rating": {"$range": [4.5, 5.0]}}],
)
show_results(results_top_tier, label="Filter: rating in [4.5, 5.0] (6 candidates)")Summary
You ran 15 real queries against the same index and saw exactly how $eq, $in, and $range change the candidate pool before ranking begins.
Key things to remember:
- Declare every filterable field in the
filterdict at upsert time — you cannot query a field that was not declared. $eqfor exact match,$infor OR within a field,$rangefor inclusive numeric ranges. All evaluated server-side.- Multiple filters in the list are always ANDed.
- Filters are eligibility gates, not ranking signals. A highly rated but irrelevant document that passes the filter still ranks at the bottom.
Under the Hood: How Filters Execute
Endee’s filter system uses a pre-filter + adaptive search strategy.
Execution pipeline
1. Filter analysis
└─ Each condition estimates its cardinality (how many IDs match)
2. Cheapest-first execution
└─ Conditions evaluated in ascending cardinality order
└─ Stops early if intermediate result becomes empty
3. Build a Roaring bitmap of matching document IDs
4. Adaptive vector search
├─ < 1,000 matching IDs → bypass HNSW, compute exact distances (brute force)
└─ ≥ 1,000 matching IDs → pass bitmap to HNSW's searchKnn via filter functorStorage per operator
| Filter type | Storage | Lookup |
|---|---|---|
$eq on string / bool | Inverted index | Direct key lookup |
$in on string | Same inverted index | Multiple key lookups → bitmap union |
$eq on number | Hybrid bucket (B+ Tree) | Point range query |
$in on number | Hybrid bucket | One point query per value → bitmap union |
$range on number | Hybrid bucket | Cursor scan with fast path for interior buckets |
A Note on Score Values
You may notice that the numeric score for the same document differs between filtered and unfiltered results. This is expected behaviour, not a bug.
Endee uses two different search paths depending on how many documents pass the filter:
| Search path | When used | Score meaning |
|---|---|---|
| HNSW | Filter pool ≥ 1,000 IDs | Cosine similarity — higher is better |
| Brute-force | Filter pool < 1,000 IDs | Cosine distance — lower is better |
In both cases the rank ordering is correct. The numeric values just live on opposite scales depending on which path was taken.
Real-World Patterns
| Use case | Filter pattern |
|---|---|
| Multi-tenancy | {"tenant_id": {"$eq": user.org_id}} — scope every query to the calling org |
| Content tiers | {"premium": {"$eq": True}} — free and paid users get different candidate pools |
| Recency gates | {"year": {"$range": [current_year - 1, current_year]}} — surface fresh content |
| Quality floors | {"rating": {"$range": [4.0, 5.0]}} — enforce a minimum quality bar |
| Multi-topic view | {"category": {"$in": ["health", "science"]}} — one index serves multiple tabs |
| Team dashboards | {"author": {"$in": team_member_ids}} — show content from a specific team |
| Fiscal year cohorts | {"year": {"$in": [2022, 2023]}} — compare specific years |
Operator Reference
# $eq — exact match (string, bool, int)
filter=[{"category": {"$eq": "tech"}}]
filter=[{"premium": {"$eq": True}}]
filter=[{"year": {"$eq": 2023}}]
# $in — list membership / OR within field (string or numeric)
filter=[{"category": {"$in": ["health", "science"]}}]
filter=[{"author": {"$in": ["alice", "bob", "carol"]}}]
filter=[{"year": {"$in": [2022, 2023, 2024]}}]
# $range — inclusive numeric range [start, end] (int or float)
filter=[{"year": {"$range": [2022, 2024]}}]
filter=[{"rating": {"$range": [4.0, 5.0]}}]
# AND — multiple filters in the list (all must match)
filter=[
{"category": {"$eq": "tech"}},
{"year": {"$range": [2022, 2024]}},
{"rating": {"$range": [4.5, 5.0]}},
{"premium": {"$eq": True}},
]All operators are evaluated server-side before vector ranking. No top_k=len(DOCUMENTS) + Python post-filter needed.
Key Takeaways
$eq,$in,$rangecover almost every real-world use case. Exact match for categories and flags, list membership for multi-value OR, numeric range for dates, scores, prices, and ages.- All three operators are server-side. Endee evaluates them before any vector computation begins.
- AND is the only multi-filter logic. OR across different fields requires separate queries and client-side merging; OR within a field is what
$inis for. - Declare everything at upsert time. Fields missing from the
filterdict at upsert cannot be queried later. - Filters are eligibility gates, not ranking signals. A highly-rated irrelevant document that passes the filter will still rank at the bottom.
All results from Endee local mode, all-MiniLM-L6-v2 (384-dim cosine), 20-document corpus with 5 metadata fields.