Filtered Dense Search: `$eq`, `$in`, and `$range` in Practice

Time: 15–20 minLevel: Beginner

In this tutorial, you will:

Combine semantic vector search with server-side metadata filters to build precise, production-ready retrieval pipelines
See exactly how each filter changes the ranked result set

Why Filtered Search?

Pure semantic search answers one question: Which documents are most similar to this query?

Filtered semantic search answers a different, more useful question: Which documents, among those I care about, are most similar to this query?

The distinction matters more than it looks. Without filters, a search for “AI in healthcare” across a multi-tenant product returns results from every user’s data. With a filter on tenant_id, only the calling user’s documents enter the ranking stage. The embedding model never had to learn tenant isolation — it just ranks within a pre-restricted pool.


┌──────────────────────────────────────────────────────────────┐
│                   Filtered Dense Search                      │
│                                                              │
│  Query ──► Embed ──► [ Filter Gate ] ──► HNSW Rank ──► Top-K │
│                            │                                 │
│                     $eq / $in / $range                       │
│                     (server-side, pre-vector)                │
└──────────────────────────────────────────────────────────────┘

Key insight: filters are eligibility gates, not ranking signals. A document that fails the filter never enters the ranking stage. A document that passes the filter but is semantically irrelevant will still rank at the bottom.

The Three Filter Operators

Endee supports three server-side filter operators that cover almost every real-world use case:

Operator	What it does	Example
`$eq`	Exact match — string, bool, or int	`{"category": {"$eq": "tech"}}`
`$in`	List membership — OR within a field	`{"category": {"$in": ["health", "science"]}}`
`$range`	Inclusive numeric range [start, end]	`{"year": {"$range": [2022, 2024]}}`

All three are evaluated server-side, before any vector ranking occurs. No top_k=len(DOCUMENTS) + Python post-filter needed.

AND is the only multi-filter logic. Multiple entries in the filter list are always ANDed:


filter=[
    {"category": {"$eq":    "tech"}},       # must match
    {"year":     {"$range": [2022, 2024]}},  # AND must match
    {"premium":  {"$eq":    True}},          # AND must match
]

OR across different fields requires separate queries and client-side merging. OR within a single field is exactly what $in is for.

Install

Required packages:

endee - client library to connect to the Endee vector database
sentence-transformers - provides the dense embedding model
numpy==2.0.0 - pinned to avoid compatibility issues


pip install --upgrade endee sentence-transformers
pip install numpy==2.0.0

Imports


from getpass import getpass
from endee import Endee
from sentence_transformers import SentenceTransformer

Connect to Endee and Create the Index

Choose your connection method: local server or serverless cloud.

Local Server: If your server has NDD_AUTH_TOKEN set, pass the same token when initializing:


client = Endee("ndd-auth-token")
client.set_base_url("http://0.0.0.0:8080/api/v1")

Endee Serverless: Go to https://app.endee.io , create a token, then pass it here:


client = Endee("your-serverless-token")


INDEX_NAME = "dense_filter_demo"
try:
    client.delete_index(INDEX_NAME)
except Exception:
    pass

client.create_index(
    name=INDEX_NAME,
    dimension=384,
    space_type="cosine",
)
index = client.get_index(INDEX_NAME)
print(f"Index '{INDEX_NAME}' ready")

Load the Embedding Model

Loads all-MiniLM-L6-v2 once and reuses it for both indexing and querying. The model converts any piece of text into a 384-number vector.


dense_model = SentenceTransformer("all-MiniLM-L6-v2")

Prepare Example Corpus

16 research articles across four categories. Each document has five metadata fields that we will use as filter dimensions - category, year, rating, author, and premium. Every field we want to filter on must be declared in the filter dict at upsert time. Fields missing from filter cannot be queried later.


DOCUMENTS = [
    # tech
    {"id": "doc_01", "text": "Neural networks are revolutionising image recognition and computer vision tasks",
     "meta": {"title": "Neural Nets & Vision",       "category": "tech",     "year": 2023, "rating": 4.5, "author": "alice", "premium": True}},
    {"id": "doc_02", "text": "Quantum computing promises exponential speedup for optimisation and cryptography",
     "meta": {"title": "Quantum Computing",          "category": "tech",     "year": 2024, "rating": 4.8, "author": "alice", "premium": True}},
    #...documents
]
 
print(f"{len(DOCUMENTS)} documents ready")

Embed and Index Documents

For each document we encode the text into a dense vector and build a payload with two separate dicts:

meta holds any data we want returned with results but it is not searchable
filter declares every field we want to filter on at query time - a field not listed here cannot be used in a filter later, so think of it as a column declaration


payload = []
 
for doc in DOCUMENTS:
    vec = dense_model.encode(doc["text"]).tolist()
    m   = doc["meta"]
    payload.append({
        "id":     doc["id"],
        "vector": vec,
        "meta":   m,
        "filter": {
            "category": m["category"],
            "year":     m["year"],
            "rating":   m["rating"],
            "author":   m["author"],
            "premium":  m["premium"],
        },
    })
 
index.upsert(payload)
print(f"{len(payload)} documents indexed")

Query Setup

All queries in this notebook use the same text. We encode it once and reuse the vector. The show_results helper prints each result with its rank, score, and metadata so we can clearly see how different filters change the output.


QUERY     = "AI applications in healthcare and medicine"
query_vec = dense_model.encode(QUERY).tolist()
TOP_K     = 5
 
def show_results(results, label=""):
    if label:
        print(f"Filter: {label}")
    for rank, r in enumerate(results, 1):
        m = r["meta"]
        print(f"  {rank}. score={r['similarity']:.4f}  [{m['category']}]  {m['title']}  ({m['author']}, {m['year']}, rating={m['rating']}, premium={m['premium']})")
    print()

Baseline - No Filter

Running the query without any filter searches all 20 documents. This is the baseline that shows us what the dense model considers most relevant before any filtering is applied.


results = index.query(vector=query_vec, top_k=TOP_K)
show_results(results, label="none — all 20 documents are candidates")

$eq - Exact Match

$eq restricts the search to documents where a field exactly equals a given value. Only documents that pass this check enter the ranking stage - everything else is excluded before any vector comparison happens. Here we filter to health articles only.


results = index.query(
    vector=query_vec,
    top_k=TOP_K,
    filter=[{"category": {"$eq": "health"}}],
)
show_results(results, label='category == "health"  (4 candidates)')

$range - Numeric Range

$range takes a two-value list [start, end] and both ends are inclusive. It works on any numeric field - year, rating, price, age. Here we filter to articles published between 2022 and 2024.


results = index.query(
    vector=query_vec,
    top_k=TOP_K,
    filter=[{"year": {"$range": [2022, 2024]}}],
)
show_results(results, label="year in [2022, 2024]  (13 candidates)")

$in - Match Any Value From a List

$in is OR within a single field. A document passes if its field value matches any item in the list. This is useful when you want results from multiple categories, multiple authors, or specific year cohorts without running separate queries.


results = index.query(
    vector=query_vec,
    top_k=TOP_K,
    filter=[{"category": {"$in": ["health", "science"]}}],
)
show_results(results, label='category in ["health", "science"]  (8 candidates)')

AND - Combining Multiple Filters

Passing multiple filters in the list ANDs them - a document must satisfy every condition to enter the candidate pool. You can mix operators freely. Here we combine $in, $range, and $eq to find premium health or science articles from 2021 to 2023.


results = index.query(
    vector=query_vec,
    top_k=TOP_K,
    filter=[
        {"category": {"$in":    ["health", "science"]}},
        {"year":     {"$range": [2021, 2023]}},
        {"premium":  {"$eq":    True}},
    ],
)
show_results(results, label='category in ["health","science"] AND year in [2021,2023] AND premium == True')

Other Filter Combinations

The four queries above cover the core patterns. All other combinations work the same way - just swap in the fields and values you need. The table below shows the remaining useful combinations as ready-to-use examples:

What you want	Filter
Only tech articles	`[{"category": {"$eq": "tech"}}]`
Only bob’s articles	`[{"author": {"$eq": "bob"}}]`
Only premium content	`[{"premium": {"$eq": True}}]`
Health articles that are premium	`[{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}]`
Alice’s tech articles only	`[{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}]`
Rating 4.0 and above	`[{"rating": {"$range": [4.0, 5.0]}}]`
Top-rated articles only (4.5+)	`[{"rating": {"$range": [4.5, 5.0]}}]`
Articles by alice or bob	`[{"author": {"$in": ["alice", "bob"]}}]`
Only 2022 and 2024 (skip 2023)	`[{"year": {"$in": [2022, 2024]}}]`
Recent high-quality tech articles	`[{"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.3, 5.0]}}]`

Cleanup

Deletes the index.


client.delete_index(INDEX_NAME)
print(f"Deleted: {INDEX_NAME}")

Key Takeaways

Filters are eligibility gates - they restrict the candidate pool before ranking, not ranking signals
$eq matches exact values - use it for categories, booleans, exact strings, or exact integers
$range includes both endpoints - [2022, 2024] includes all years from 2022 through 2024
$in is OR within a field - perfect for multi-value selections like “alice or bob” or “health or science”
Multiple filters are ANDed - all conditions must be true for a document to qualify
Declare fields in filter at upsert time - any field you want to filter later must be in the filter dict
meta and filter are separate - meta is returned with results; filter controls what enters the ranking stage
Combine operators freely - mix $eq, $in, and $range in a single query
For OR across fields, use multiple queries - filter logic only ANDs, so “category=tech OR author=bob” requires two queries merged client-side

Filtered Dense Search: $eq, $in, and $range in Practice