Skip to main content
Morphik lets you filter documents and chunks directly in the database using a concise JSON filter syntax. The same structure powers the REST API, Python SDK (sync + async), folder helpers, UserScope, caches, and knowledge-graph builders, so you can define a filter once and reuse it everywhere.
Prefer server-side filters over client-side post-processing. You’ll reduce bandwidth, improve performance, and keep behavior consistent between endpoints.

Where Filters Apply

You can pass filters (or document_filters) to:

Quick Start

from datetime import datetime
from morphik import Morphik

db = Morphik()
filters = {
    "$and": [
        {"department": {"$eq": "research"}},
        {"priority": {"$gte": 40}},
        {"start_date": {"$lte": datetime.now().isoformat()}},
        {"tags": {"$contains": {"value": "contract"}}}
    ]
}

chunks = db.retrieve_chunks("project delta highlights", filters=filters, k=6)

Typed Metadata

Typed comparisons (numbers, decimals, dates, datetimes) rely on metadata_types. Supply the per-field hints during ingest or metadata updates:
doc = db.ingest_text(
    content="SOW for Delta",
    metadata={
        "priority": 42,
        "start_date": "2024-01-15T12:30:00Z",
        "end_date": "2024-12-31",
        "cost": "1234.56"
    },
    metadata_types={
        "priority": "number",
        "start_date": "datetime",
        "end_date": "date",
        "cost": "decimal"
    }
)
If you omit a hint, Morphik infers one automatically for simple scalars, but explicitly declaring types is recommended for reliable range queries.

Implicit vs Explicit Syntax

  • Implicit equality – Bare key/value pairs ({"status": "active"}) use JSON containment and are ideal for simple matching. They also check whether an array contains the value.
  • Explicit operators – Wrap a field in an operator object to unlock typed comparisons, set logic, regex, substring checks, etc. ({"status": {"$ne": "archived"}}).

Operator Reference

Equality & Comparison

OperatorDescriptionExample
$eq / implicit valueEquality (also matches scalars in arrays).{"status": {"$eq": "completed"}}
$neNot equal.{"status": {"$ne": "archived"}}
$gt, $gte, $lt, $lteGreater/less-than comparisons for numbers, decimals, dates, datetimes, and strings ($eq/$ne only). Requires correct metadata_types.{"priority": {"$gte": 40}}, {"end_date": {"$lt": "2025-01-01"}}

Set Membership

OperatorDescriptionExample
$inMatches any operand in the provided list.{"status": {"$in": ["completed", "processing"]}}
$ninMatches when the value is not in the list.{"region": {"$nin": ["EU", "LATAM"]}}

Type & Existence

OperatorDescriptionExample
$existsField must (or must not) exist. Accepts booleans or truthy strings.{"external_id": {"$exists": true}}
$typeField must have one of the supported metadata types (string, number, decimal, datetime, date, boolean, array, object, null).{"start_date": {"$type": "datetime"}}

String & Pattern Matching

OperatorDescriptionExample
$containsCase-insensitive substring match by default; accepts { "value": "...", "case_sensitive": bool }. Works on scalars and array entries.{"title": {"$contains": "Q4 Summary"}}
$regexPostgreSQL regex match. Accepts a raw string pattern or { "pattern": "...", "flags": "i" } (only the i flag is supported). Works on scalars and arrays.{"folder": {"$regex": {"pattern": "^fin", "flags": "i"}}}

Logical Composition

OperatorDescription
$andAll nested clauses must match (non-empty list).
$orAt least one nested clause must match.
$norNone of the nested clauses may match (NOT (A OR B)).
$notInverts a single clause.
Mix logical operators freely with field-level operators for complex expressions.

Common Patterns

Current Window Between Start/End

{
  "$and": [
    {"start_date": {"$lte": "2024-06-01T00:00:00Z"}},
    {"end_date": {"$gte": "2024-06-01T00:00:00Z"}}
  ]
}

Folder/User Scope plus Metadata

folder = db.get_folder("legal")
scoped = folder.signin("user-42")

filters = {"priority": {"$gte": 50}}
response = scoped.list_documents(filters=filters, include_total_count=True)

Array Membership & Substring

{
  "$and": [
    {"tags": {"$contains": {"value": "contract"}}},
    {"tags": {"$regex": {"pattern": "quarter", "flags": "i"}}}
  ]
}

Troubleshooting

  • “Unsupported metadata filter operator …” – Double-check spelling and operand type (lists for $in, non-empty arrays for $and, etc.).
  • “Metadata field … expects type …” – The server couldn’t coerce the operand to the declared type. Ensure numbers/dates are valid JSON scalars or native Python types before serialization.
  • Range query returns nothing – Confirm the target documents were ingested/updated with the corresponding metadata_types. Re-ingest or call update_document_metadata with the proper type hints if necessary.
Still stuck? Share your filter payload and endpoint at founders@morphik.ai or on Discord.