Filtered Dense Search: $eq, $in, and $range in Practice
In this tutorial, you will:
- Combine semantic vector search with server-side metadata filters to build precise, production-ready retrieval pipelines
- See exactly how each filter changes the ranked result set
Why Filtered Search?
Pure semantic search answers one question: Which documents are most similar to this query?
Filtered semantic search answers a different, more useful question: Which documents, among those I care about, are most similar to this query?
The distinction matters more than it looks. Without filters, a search for “AI in healthcare” across a multi-tenant product returns results from every user’s data. With a filter on tenant_id, only the calling user’s documents enter the ranking stage. The embedding model never had to learn tenant isolation — it just ranks within a pre-restricted pool.
┌──────────────────────────────────────────────────────────────┐
│ Filtered Dense Search │
│ │
│ Query ──► Embed ──► [ Filter Gate ] ──► HNSW Rank ──► Top-K │
│ │ │
│ $eq / $in / $range │
│ (server-side, pre-vector) │
└──────────────────────────────────────────────────────────────┘Key insight: filters are eligibility gates, not ranking signals. A document that fails the filter never enters the ranking stage. A document that passes the filter but is semantically irrelevant will still rank at the bottom.
The Three Filter Operators
Endee supports three server-side filter operators that cover almost every real-world use case:
| Operator | What it does | Example |
|---|---|---|
$eq | Exact match — string, bool, or int | {"category": {"$eq": "tech"}} |
$in | List membership — OR within a field | {"category": {"$in": ["health", "science"]}} |
$range | Inclusive numeric range [start, end] | {"year": {"$range": [2022, 2024]}} |
All three are evaluated server-side, before any vector ranking occurs. No top_k=len(DOCUMENTS) + Python post-filter needed.
AND is the only multi-filter logic. Multiple entries in the filter list are always ANDed:
filter=[
{"category": {"$eq": "tech"}}, # must match
{"year": {"$range": [2022, 2024]}}, # AND must match
{"premium": {"$eq": True}}, # AND must match
]OR across different fields requires separate queries and client-side merging. OR within a single field is exactly what $in is for.
Install
Required packages:
endee- client library to connect to the Endee vector databasesentence-transformers- provides the dense embedding modelnumpy==2.0.0- pinned to avoid compatibility issues
pip install --upgrade endee sentence-transformers
pip install numpy==2.0.0Imports
from getpass import getpass
from endee import Endee
from sentence_transformers import SentenceTransformerConnect to Endee and Create the Index
Choose your connection method: local server or serverless cloud.
Local Server: If your server has NDD_AUTH_TOKEN set, pass the same token when initializing:
client = Endee("ndd-auth-token")
client.set_base_url("http://0.0.0.0:8080/api/v1")Endee Serverless: Go to https://app.endee.io , create a token, then pass it here:
client = Endee("your-serverless-token")INDEX_NAME = "dense_filter_demo"
try:
client.delete_index(INDEX_NAME)
except Exception:
pass
client.create_index(
name=INDEX_NAME,
dimension=384,
space_type="cosine",
)
index = client.get_index(INDEX_NAME)
print(f"Index '{INDEX_NAME}' ready")Load the Embedding Model
Loads all-MiniLM-L6-v2 once and reuses it for both indexing and querying. The model converts any piece of text into a 384-number vector.
dense_model = SentenceTransformer("all-MiniLM-L6-v2")Prepare Example Corpus
16 research articles across four categories. Each document has five metadata fields that we will use as filter dimensions - category, year, rating, author, and premium. Every field we want to filter on must be declared in the filter dict at upsert time. Fields missing from filter cannot be queried later.
DOCUMENTS = [
# tech
{"id": "doc_01", "text": "Neural networks are revolutionising image recognition and computer vision tasks",
"meta": {"title": "Neural Nets & Vision", "category": "tech", "year": 2023, "rating": 4.5, "author": "alice", "premium": True}},
{"id": "doc_02", "text": "Quantum computing promises exponential speedup for optimisation and cryptography",
"meta": {"title": "Quantum Computing", "category": "tech", "year": 2024, "rating": 4.8, "author": "alice", "premium": True}},
#...documents
]
print(f"{len(DOCUMENTS)} documents ready")Embed and Index Documents
For each document we encode the text into a dense vector and build a payload with two separate dicts:
metaholds any data we want returned with results but it is not searchablefilterdeclares every field we want to filter on at query time - a field not listed here cannot be used in a filter later, so think of it as a column declaration
payload = []
for doc in DOCUMENTS:
vec = dense_model.encode(doc["text"]).tolist()
m = doc["meta"]
payload.append({
"id": doc["id"],
"vector": vec,
"meta": m,
"filter": {
"category": m["category"],
"year": m["year"],
"rating": m["rating"],
"author": m["author"],
"premium": m["premium"],
},
})
index.upsert(payload)
print(f"{len(payload)} documents indexed")Query Setup
All queries in this notebook use the same text. We encode it once and reuse the vector. The show_results helper prints each result with its rank, score, and metadata so we can clearly see how different filters change the output.
QUERY = "AI applications in healthcare and medicine"
query_vec = dense_model.encode(QUERY).tolist()
TOP_K = 5
def show_results(results, label=""):
if label:
print(f"Filter: {label}")
for rank, r in enumerate(results, 1):
m = r["meta"]
print(f" {rank}. score={r['similarity']:.4f} [{m['category']}] {m['title']} ({m['author']}, {m['year']}, rating={m['rating']}, premium={m['premium']})")
print()Baseline - No Filter
Running the query without any filter searches all 20 documents. This is the baseline that shows us what the dense model considers most relevant before any filtering is applied.
results = index.query(vector=query_vec, top_k=TOP_K)
show_results(results, label="none — all 20 documents are candidates")$eq - Exact Match
$eq restricts the search to documents where a field exactly equals a given value. Only documents that pass this check enter the ranking stage - everything else is excluded before any vector comparison happens. Here we filter to health articles only.
results = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "health"}}],
)
show_results(results, label='category == "health" (4 candidates)')$range - Numeric Range
$range takes a two-value list [start, end] and both ends are inclusive. It works on any numeric field - year, rating, price, age. Here we filter to articles published between 2022 and 2024.
results = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"year": {"$range": [2022, 2024]}}],
)
show_results(results, label="year in [2022, 2024] (13 candidates)")$in - Match Any Value From a List
$in is OR within a single field. A document passes if its field value matches any item in the list. This is useful when you want results from multiple categories, multiple authors, or specific year cohorts without running separate queries.
results = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$in": ["health", "science"]}}],
)
show_results(results, label='category in ["health", "science"] (8 candidates)')AND - Combining Multiple Filters
Passing multiple filters in the list ANDs them - a document must satisfy every condition to enter the candidate pool. You can mix operators freely. Here we combine $in, $range, and $eq to find premium health or science articles from 2021 to 2023.
results = index.query(
vector=query_vec,
top_k=TOP_K,
filter=[
{"category": {"$in": ["health", "science"]}},
{"year": {"$range": [2021, 2023]}},
{"premium": {"$eq": True}},
],
)
show_results(results, label='category in ["health","science"] AND year in [2021,2023] AND premium == True')Other Filter Combinations
The four queries above cover the core patterns. All other combinations work the same way - just swap in the fields and values you need. The table below shows the remaining useful combinations as ready-to-use examples:
| What you want | Filter |
|---|---|
| Only tech articles | [{"category": {"$eq": "tech"}}] |
| Only bob’s articles | [{"author": {"$eq": "bob"}}] |
| Only premium content | [{"premium": {"$eq": True}}] |
| Health articles that are premium | [{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}] |
| Alice’s tech articles only | [{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}] |
| Rating 4.0 and above | [{"rating": {"$range": [4.0, 5.0]}}] |
| Top-rated articles only (4.5+) | [{"rating": {"$range": [4.5, 5.0]}}] |
| Articles by alice or bob | [{"author": {"$in": ["alice", "bob"]}}] |
| Only 2022 and 2024 (skip 2023) | [{"year": {"$in": [2022, 2024]}}] |
| Recent high-quality tech articles | [{"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.3, 5.0]}}] |
Cleanup
Deletes the index.
client.delete_index(INDEX_NAME)
print(f"Deleted: {INDEX_NAME}")Key Takeaways
- Filters are eligibility gates - they restrict the candidate pool before ranking, not ranking signals
$eqmatches exact values - use it for categories, booleans, exact strings, or exact integers$rangeincludes both endpoints -[2022, 2024]includes all years from 2022 through 2024$inis OR within a field - perfect for multi-value selections like “alice or bob” or “health or science”- Multiple filters are ANDed - all conditions must be true for a document to qualify
- Declare fields in filter at upsert time - any field you want to filter later must be in the
filterdict - meta and filter are separate - meta is returned with results; filter controls what enters the ranking stage
- Combine operators freely - mix
$eq,$in, and$rangein a single query - For OR across fields, use multiple queries - filter logic only ANDs, so “category=tech OR author=bob” requires two queries merged client-side