Skip to Content
TutorialsSearch with Filters

Filtered Dense Search: $eq, $in, and $range in Practice

Time: 15–20 minLevel: BeginnerOpen In Colab

In this tutorial, you will:

  • Combine semantic vector search with server-side metadata filters to build precise, production-ready retrieval pipelines
  • See exactly how each filter changes the ranked result set

Pure semantic search answers one question: Which documents are most similar to this query?

Filtered semantic search answers a different, more useful question: Which documents, among those I care about, are most similar to this query?

The distinction matters more than it looks. Without filters, a search for “AI in healthcare” across a multi-tenant product returns results from every user’s data. With a filter on tenant_id, only the calling user’s documents enter the ranking stage. The embedding model never had to learn tenant isolation — it just ranks within a pre-restricted pool.

┌──────────────────────────────────────────────────────────────┐ │ Filtered Dense Search │ │ │ │ Query ──► Embed ──► [ Filter Gate ] ──► HNSW Rank ──► Top-K │ │ │ │ │ $eq / $in / $range │ │ (server-side, pre-vector) │ └──────────────────────────────────────────────────────────────┘

Key insight: filters are eligibility gates, not ranking signals. A document that fails the filter never enters the ranking stage. A document that passes the filter but is semantically irrelevant will still rank at the bottom.


The Three Filter Operators

Endee supports three server-side filter operators that cover almost every real-world use case:

OperatorWhat it doesExample
$eqExact match — string, bool, or int{"category": {"$eq": "tech"}}
$inList membership — OR within a field{"category": {"$in": ["health", "science"]}}
$rangeInclusive numeric range [start, end]{"year": {"$range": [2022, 2024]}}

All three are evaluated server-side, before any vector ranking occurs. No top_k=len(DOCUMENTS) + Python post-filter needed.

AND is the only multi-filter logic. Multiple entries in the filter list are always ANDed:

filter=[ {"category": {"$eq": "tech"}}, # must match {"year": {"$range": [2022, 2024]}}, # AND must match {"premium": {"$eq": True}}, # AND must match ]

OR across different fields requires separate queries and client-side merging. OR within a single field is exactly what $in is for.


Install

Required packages:

  • endee - client library to connect to the Endee vector database
  • sentence-transformers - provides the dense embedding model
  • numpy==2.0.0 - pinned to avoid compatibility issues
pip install --upgrade endee sentence-transformers pip install numpy==2.0.0

Imports

from getpass import getpass from endee import Endee from sentence_transformers import SentenceTransformer

Connect to Endee and Create the Index

Choose your connection method: local server or serverless cloud.

Local Server: If your server has NDD_AUTH_TOKEN set, pass the same token when initializing:

client = Endee("ndd-auth-token") client.set_base_url("http://0.0.0.0:8080/api/v1")

Endee Serverless: Go to https://app.endee.io , create a token, then pass it here:

client = Endee("your-serverless-token")
INDEX_NAME = "dense_filter_demo" try: client.delete_index(INDEX_NAME) except Exception: pass client.create_index( name=INDEX_NAME, dimension=384, space_type="cosine", ) index = client.get_index(INDEX_NAME) print(f"Index '{INDEX_NAME}' ready")

Load the Embedding Model

Loads all-MiniLM-L6-v2 once and reuses it for both indexing and querying. The model converts any piece of text into a 384-number vector.

dense_model = SentenceTransformer("all-MiniLM-L6-v2")

Prepare Example Corpus

16 research articles across four categories. Each document has five metadata fields that we will use as filter dimensions - category, year, rating, author, and premium. Every field we want to filter on must be declared in the filter dict at upsert time. Fields missing from filter cannot be queried later.

DOCUMENTS = [ # tech {"id": "doc_01", "text": "Neural networks are revolutionising image recognition and computer vision tasks", "meta": {"title": "Neural Nets & Vision", "category": "tech", "year": 2023, "rating": 4.5, "author": "alice", "premium": True}}, {"id": "doc_02", "text": "Quantum computing promises exponential speedup for optimisation and cryptography", "meta": {"title": "Quantum Computing", "category": "tech", "year": 2024, "rating": 4.8, "author": "alice", "premium": True}}, #...documents ] print(f"{len(DOCUMENTS)} documents ready")

Embed and Index Documents

For each document we encode the text into a dense vector and build a payload with two separate dicts:

  • meta holds any data we want returned with results but it is not searchable
  • filter declares every field we want to filter on at query time - a field not listed here cannot be used in a filter later, so think of it as a column declaration
payload = [] for doc in DOCUMENTS: vec = dense_model.encode(doc["text"]).tolist() m = doc["meta"] payload.append({ "id": doc["id"], "vector": vec, "meta": m, "filter": { "category": m["category"], "year": m["year"], "rating": m["rating"], "author": m["author"], "premium": m["premium"], }, }) index.upsert(payload) print(f"{len(payload)} documents indexed")

Query Setup

All queries in this notebook use the same text. We encode it once and reuse the vector. The show_results helper prints each result with its rank, score, and metadata so we can clearly see how different filters change the output.

QUERY = "AI applications in healthcare and medicine" query_vec = dense_model.encode(QUERY).tolist() TOP_K = 5 def show_results(results, label=""): if label: print(f"Filter: {label}") for rank, r in enumerate(results, 1): m = r["meta"] print(f" {rank}. score={r['similarity']:.4f} [{m['category']}] {m['title']} ({m['author']}, {m['year']}, rating={m['rating']}, premium={m['premium']})") print()

Baseline - No Filter

Running the query without any filter searches all 20 documents. This is the baseline that shows us what the dense model considers most relevant before any filtering is applied.

results = index.query(vector=query_vec, top_k=TOP_K) show_results(results, label="none — all 20 documents are candidates")

$eq - Exact Match

$eq restricts the search to documents where a field exactly equals a given value. Only documents that pass this check enter the ranking stage - everything else is excluded before any vector comparison happens. Here we filter to health articles only.

results = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$eq": "health"}}], ) show_results(results, label='category == "health" (4 candidates)')

$range - Numeric Range

$range takes a two-value list [start, end] and both ends are inclusive. It works on any numeric field - year, rating, price, age. Here we filter to articles published between 2022 and 2024.

results = index.query( vector=query_vec, top_k=TOP_K, filter=[{"year": {"$range": [2022, 2024]}}], ) show_results(results, label="year in [2022, 2024] (13 candidates)")

$in - Match Any Value From a List

$in is OR within a single field. A document passes if its field value matches any item in the list. This is useful when you want results from multiple categories, multiple authors, or specific year cohorts without running separate queries.

results = index.query( vector=query_vec, top_k=TOP_K, filter=[{"category": {"$in": ["health", "science"]}}], ) show_results(results, label='category in ["health", "science"] (8 candidates)')

AND - Combining Multiple Filters

Passing multiple filters in the list ANDs them - a document must satisfy every condition to enter the candidate pool. You can mix operators freely. Here we combine $in, $range, and $eq to find premium health or science articles from 2021 to 2023.

results = index.query( vector=query_vec, top_k=TOP_K, filter=[ {"category": {"$in": ["health", "science"]}}, {"year": {"$range": [2021, 2023]}}, {"premium": {"$eq": True}}, ], ) show_results(results, label='category in ["health","science"] AND year in [2021,2023] AND premium == True')

Other Filter Combinations

The four queries above cover the core patterns. All other combinations work the same way - just swap in the fields and values you need. The table below shows the remaining useful combinations as ready-to-use examples:

What you wantFilter
Only tech articles[{"category": {"$eq": "tech"}}]
Only bob’s articles[{"author": {"$eq": "bob"}}]
Only premium content[{"premium": {"$eq": True}}]
Health articles that are premium[{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}]
Alice’s tech articles only[{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}]
Rating 4.0 and above[{"rating": {"$range": [4.0, 5.0]}}]
Top-rated articles only (4.5+)[{"rating": {"$range": [4.5, 5.0]}}]
Articles by alice or bob[{"author": {"$in": ["alice", "bob"]}}]
Only 2022 and 2024 (skip 2023)[{"year": {"$in": [2022, 2024]}}]
Recent high-quality tech articles[{"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.3, 5.0]}}]

Cleanup

Deletes the index.

client.delete_index(INDEX_NAME) print(f"Deleted: {INDEX_NAME}")

Key Takeaways

  • Filters are eligibility gates - they restrict the candidate pool before ranking, not ranking signals
  • $eq matches exact values - use it for categories, booleans, exact strings, or exact integers
  • $range includes both endpoints - [2022, 2024] includes all years from 2022 through 2024
  • $in is OR within a field - perfect for multi-value selections like “alice or bob” or “health or science”
  • Multiple filters are ANDed - all conditions must be true for a document to qualify
  • Declare fields in filter at upsert time - any field you want to filter later must be in the filter dict
  • meta and filter are separate - meta is returned with results; filter controls what enters the ranking stage
  • Combine operators freely - mix $eq, $in, and $range in a single query
  • For OR across fields, use multiple queries - filter logic only ANDs, so “category=tech OR author=bob” requires two queries merged client-side