Filtered search
Combine semantic vector search with server-side metadata filters (
$eq,$in,$range) to build precise retrieval pipelines.
Filtered search answers: Which documents, among those I care about, are most similar to this query? Filters are eligibility gates evaluated server-side before any object ranking - a document that fails the filter never enters ranking.
Filter operators
| Operator | Description | Example |
|---|---|---|
$eq | Exact match - string, bool, or int | {"category": {"$eq": "tech"}} |
$in | List membership - OR within a field | {"category": {"$in": ["health", "science"]}} |
$range | Inclusive numeric range [start, end] | {"year": {"$range": [2022, 2024]}} |
Multiple entries in the filter list are always ANDed:
filter=[
{"category": {"$eq": "tech"}}, # must match
{"year": {"$range": [2022, 2024]}}, # AND must match
{"premium": {"$eq": True}}, # AND must match
]OR across different fields requires separate queries merged client-side. OR within a single field is what $in is for.
Installation
pip install --upgrade endee sentence-transformers
pip install numpy==2.0.0Imports
from getpass import getpass
from endee import Endee
from sentence_transformers import SentenceTransformerAuthentication
Local server
If NDD_AUTH_TOKEN is set on the server, pass the same token:
client = Endee("ndd-auth-token")
client.set_base_url("http://0.0.0.0:8080/api/v1")Endee Cloud
Create a token at app.endee.io :
client = Endee("your-serverless-token")Creating a collection
COLLECTION_NAME = "dense_filter_demo"
try:
client.delete_collection(COLLECTION_NAME)
except Exception:
pass
client.create_collection(
name=COLLECTION_NAME,
dimension=384,
space_type="cosine",
)
collection = client.get_collection(COLLECTION_NAME)
print(f"Collection '{COLLECTION_NAME}' ready")Loading the embedding model
all-MiniLM-L6-v2 converts text into 384-dimensional vectors. Load it once and reuse for both indexing and querying.
dense_model = SentenceTransformer("all-MiniLM-L6-v2")Preparing the dataset
16 research articles across four categories. Each document has five metadata fields used as filter dimensions: category, year, rating, author, and premium.
Every field you want to filter on must be declared in the
filterdict at upsert time. Fields missing fromfiltercannot be queried later.
DOCUMENTS = [
# tech
{"id": "doc_01", "text": "Neural networks are revolutionising image recognition and computer vision tasks",
"meta": {"title": "Neural Nets & Vision", "category": "tech", "year": 2023, "rating": 4.5, "author": "alice", "premium": True}},
{"id": "doc_02", "text": "Quantum computing promises exponential speedup for optimisation and cryptography",
"meta": {"title": "Quantum Computing", "category": "tech", "year": 2024, "rating": 4.8, "author": "alice", "premium": True}},
# ...more documents
]
print(f"{len(DOCUMENTS)} documents ready")Indexing documents
For each document, encode the text and build a payload with two separate dicts:
meta- returned with results, not searchablefilter- declares every field available for filtering at query time
payload = []
for doc in DOCUMENTS:
vec = dense_model.encode(doc["text"]).tolist()
m = doc["meta"]
payload.append({
"id": doc["id"],
"vector": vec,
"meta": m,
"filter": {
"category": m["category"],
"year": m["year"],
"rating": m["rating"],
"author": m["author"],
"premium": m["premium"],
},
})
collection.upsert(payload)
print(f"{len(payload)} documents indexed")Setting up queries
All queries use the same text. The show_results helper prints rank, score, and metadata.
QUERY = "AI applications in healthcare and medicine"
query_vec = dense_model.encode(QUERY).tolist()
TOP_K = 5
def show_results(results, label=""):
if label:
print(f"Filter: {label}")
for rank, r in enumerate(results, 1):
m = r["meta"]
print(f" {rank}. score={r['similarity']:.4f} [{m['category']}] {m['title']} ({m['author']}, {m['year']}, rating={m['rating']}, premium={m['premium']})")
print()Baseline search (no filter)
Search all documents to see what the dense model considers relevant before any filtering.
results = collection.query(vector=query_vec, top_k=TOP_K)
show_results(results, label="none - all documents are candidates")$eq - Exact match
Restrict to documents where a field exactly equals a given value:
results = collection.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$eq": "health"}}],
)
show_results(results, label='category == "health" (4 candidates)')$range - Numeric range
Takes a two-value list [start, end] - both ends inclusive. Works on any numeric field.
results = collection.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"year": {"$range": [2022, 2024]}}],
)
show_results(results, label="year in [2022, 2024] (13 candidates)")$in - Match any value from a list
OR within a single field. A document passes if its value matches any item in the list.
results = collection.query(
vector=query_vec,
top_k=TOP_K,
filter=[{"category": {"$in": ["health", "science"]}}],
)
show_results(results, label='category in ["health", "science"] (8 candidates)')Combining multiple filters
Multiple filters are ANDed - a document must satisfy every condition:
results = collection.query(
vector=query_vec,
top_k=TOP_K,
filter=[
{"category": {"$in": ["health", "science"]}},
{"year": {"$range": [2021, 2023]}},
{"premium": {"$eq": True}},
],
)
show_results(results, label='category in ["health","science"] AND year in [2021,2023] AND premium == True')Filter reference
Common filter patterns ready to copy:
| What you want | Filter |
|---|---|
| Only tech articles | [{"category": {"$eq": "tech"}}] |
| Only bob’s articles | [{"author": {"$eq": "bob"}}] |
| Only premium content | [{"premium": {"$eq": True}}] |
| Health articles that are premium | [{"category": {"$eq": "health"}}, {"premium": {"$eq": True}}] |
| Alice’s tech articles only | [{"category": {"$eq": "tech"}}, {"author": {"$eq": "alice"}}] |
| Rating 4.0 and above | [{"rating": {"$range": [4.0, 5.0]}}] |
| Top-rated articles only (4.5+) | [{"rating": {"$range": [4.5, 5.0]}}] |
| Articles by alice or bob | [{"author": {"$in": ["alice", "bob"]}}] |
| Only 2022 and 2024 (skip 2023) | [{"year": {"$in": [2022, 2024]}}] |
| Recent high-quality tech articles | [{"category": {"$eq": "tech"}}, {"year": {"$range": [2022, 2024]}}, {"rating": {"$range": [4.3, 5.0]}}] |
Cleanup
client.delete_collection(COLLECTION_NAME)
print(f"Deleted: {COLLECTION_NAME}")Key takeaways
- Filters run before object ranking - the embedding model only sees documents that passed the filter
- Declare filter fields at upsert time - any field you want to filter later must be in the
filterdict when indexing metaandfilterare separate -metais returned with results;filtercontrols what enters ranking- All filters are ANDed - for OR across different fields, run separate queries and merge client-side