Filtered search

Combine semantic vector search with server-side metadata filters ($eq, $in, $range, $gt, $gte, $lt, $lte) to build precise retrieval pipelines.

Filtered search answers: Which documents, among those I care about, are most similar to this query? Filters are eligibility gates evaluated server-side before any vector ranking - a document that fails the filter never enters ranking.

Filter operators

Operator	Description	Example
`$eq`	Exact match - string, bool, or number	`{"category": {"$eq": "tech"}}`
`$in`	List membership - OR within a field	`{"category": {"$in": ["health", "science"]}}`
`$range`	Inclusive numeric range `[start, end]`	`{"year": {"$range": [2022, 2024]}}`
`$gt`	Greater than	`{"price": {"$gt": 50}}`
`$gte`	Greater than or equal	`{"price": {"$gte": 80}}`
`$lt`	Less than	`{"price": {"$lt": 40}}`
`$lte`	Less than or equal	`{"price": {"$lte": 45}}`

Multiple entries in the filter list are always ANDed:


filter=[
    {"category": {"$eq":    "tech"}},       # must match
    {"year":     {"$range": [2022, 2024]}},  # AND must match
    {"premium":  {"$eq":    True}},          # AND must match
]

OR across different fields requires separate queries merged client-side. OR within a single field is what $in is for.

Installation


pip install --upgrade endee sentence-transformers
pip install numpy==2.0.0

Imports


from getpass import getpass
from endee import Endee
from sentence_transformers import SentenceTransformer

Authentication

Create a token at app.endee.io and pass it to the client:


client = Endee("your-serverless-token")

Creating a collection


COLLECTION_NAME = "dense_filter_demo"
try:
    client.delete_collection(COLLECTION_NAME)
except Exception:
    pass
 
client.create_collection(
    name=COLLECTION_NAME,
    fields=[
        {
            "name": "embedding",
            "type": "vector",
            "params": {"dimension": 384, "space_type": "cosine", "precision": "float32"},
        },
    ],
)
collection = client.get_collection(COLLECTION_NAME)
print(f"Collection '{COLLECTION_NAME}' ready")

Loading the embedding model

all-MiniLM-L6-v2 converts text into 384-dimensional vectors. Load it once and reuse for both indexing and querying.


dense_model = SentenceTransformer("all-MiniLM-L6-v2")

Preparing the dataset

16 research articles across four categories. Each document has five metadata fields used as filter dimensions: category, year, rating, author, and premium.

Every field you want to filter on must be declared in the filter dict at upsert time. Fields missing from filter cannot be queried later.


DOCUMENTS = [
    # tech
    {"id": "doc_01", "text": "Neural networks are revolutionising image recognition and computer vision tasks",
     "meta": {"title": "Neural Nets & Vision",  "category": "tech", "year": 2023, "rating": 4.5, "author": "alice", "premium": True}},
    {"id": "doc_02", "text": "Quantum computing promises exponential speedup for optimisation and cryptography",
     "meta": {"title": "Quantum Computing",     "category": "tech", "year": 2024, "rating": 4.8, "author": "alice", "premium": True}},
    # ...more documents
]
 
print(f"{len(DOCUMENTS)} documents ready")

Upserting objects

For each document, encode the text and build an object with separate dicts:

meta - returned with results, not searchable
filter - declares every field available for filtering at query time
fields - contains the vector data for each field


objects = []
 
for doc in DOCUMENTS:
    vec = dense_model.encode(doc["text"]).tolist()
    m   = doc["meta"]
    objects.append({
        "id":     doc["id"],
        "meta":   m,
        "filter": {
            "category": m["category"],
            "year":     m["year"],
            "rating":   m["rating"],
            "author":   m["author"],
            "premium":  m["premium"],
        },
        "fields": {
            "embedding": vec,
        },
    })
 
collection.upsert(objects)
print(f"{len(objects)} documents indexed")

Setting up queries

All queries use the same text. The show_results helper prints rank, score, and metadata.


QUERY     = "AI applications in healthcare and medicine"
query_vec = dense_model.encode(QUERY).tolist()
LIMIT     = 5
 
def show_results(results, label=""):
    if label:
        print(f"Filter: {label}")
    for rank, r in enumerate(results, 1):
        m = r["meta"]
        print(f"  {rank}. score={r['similarity']:.4f}  [{m['category']}]  {m['title']}  ({m['author']}, {m['year']}, rating={m['rating']}, premium={m['premium']})")
    print()

Baseline search (no filter)

Search all documents to see what the dense model considers relevant before any filtering.


results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
)
show_results(results["results"]["embedding"], label="none - all documents are candidates")

$eq - Exact match

Restrict to documents where a field exactly equals a given value:


results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"category": {"$eq": "health"}}],
)
show_results(results["results"]["embedding"], label='category == "health"  (4 candidates)')

$range - Numeric range

Takes a two-value list [start, end] - both ends inclusive. Works on any numeric field.


results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"year": {"$range": [2022, 2024]}}],
)
show_results(results["results"]["embedding"], label="year in [2022, 2024]  (13 candidates)")

$in - Match any value from a list

OR within a single field. A document passes if its value matches any item in the list.


results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"category": {"$in": ["health", "science"]}}],
)
show_results(results["results"]["embedding"], label='category in ["health", "science"]  (8 candidates)')

$gt /$ gte / $lt /$ lte - Comparison operators

Fine-grained numeric comparisons for when $range is too broad:


# $gt — price greater than 50
results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"price": {"$gt": 50}}],
)
 
# $gte — rating >= 4.5 (works with floats)
results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"rating": {"$gte": 4.5}}],
)
 
# $lt — price less than 40
results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"price": {"$lt": 40}}],
)
 
# $lte — price <= 45
results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[{"price": {"$lte": 45}}],
)

Combining multiple filters

Multiple filters are ANDed - a document must satisfy every condition:


results = collection.search(
    fields={"embedding": {"query": query_vec, "limit": LIMIT}},
    filter=[
        {"category": {"$in":    ["health", "science"]}},
        {"year":     {"$range": [2021, 2023]}},
        {"premium":  {"$eq":    True}},
    ],
)
show_results(results["results"]["embedding"], label='category in ["health","science"] AND year in [2021,2023] AND premium == True')

Filter reference

Common filter patterns ready to copy:

What you want	Filter
Only tech articles	`[{"category": {"$eq": "tech"}}]`
Only bob’s articles	`[{"author": {"$eq": "bob"}}]`
Only premium content	`[{"premium": {"$eq": True}}]`
Health articles that are premium	`[{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}]`
Alice’s tech articles only	`[{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}]`
Rating 4.0 and above	`[{"rating": {"$gte": 4.0}}]`
Top-rated articles only (4.5+)	`[{"rating": {"$gte": 4.5}}]`
Articles by alice or bob	`[{"author": {"$in": ["alice", "bob"]}}]`
Only 2022 and 2024 (skip 2023)	`[{"year": {"$in": [2022, 2024]}}]`
Recent high-quality tech articles	`[{"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$gte": 4.3}}]`
Price under 100	`[{"price": {"$lt": 100}}]`
Price over 50	`[{"price": {"$gt": 50}}]`
Electronics AND price < 100	`[{"category": {"$eq": "electronics"}}, {"price": {"$lt": 100}}]`

Cleanup


client.delete_collection(COLLECTION_NAME)
print(f"Deleted: {COLLECTION_NAME}")

Key takeaways

Filters run before vector ranking - the embedding model only sees documents that passed the filter
Declare filter fields at upsert time - any field you want to filter later must be in the filter dict when upserting
meta and filter are separate - meta is returned with results; filter controls what enters ranking
All filters are ANDed - for OR across different fields, run separate queries and merge client-side
Seven operators - $eq, $in, $range for set/range matching; $gt, $gte, $lt, $lte for fine-grained comparisons