Skip to main content
  • Sync
  • Async
def retrieve_docs(
    query: str,
    filters: Optional[Dict[str, Any]] = None,
    k: int = 4,
    min_score: float = 0.0,
    use_colpali: bool = True,
) -> List[DocumentResult]

Parameters

  • query (str): Search query text
  • filters (Dict[str, Any], optional): Optional metadata filters
  • k (int, optional): Number of results. Defaults to 4.
  • min_score (float, optional): Minimum similarity threshold. Defaults to 0.0.
  • use_colpali (bool, optional): Whether to use ColPali-style embedding model to retrieve the documents (only works for documents ingested with use_colpali=True). Defaults to True.

Metadata Filters

Filters share a common JSON DSL. Review the Metadata Filtering guide for supported operators and typed comparisons. Example:
filters = {
    "$and": [
        {"department": {"$eq": "research"}},
        {"priority": {"$gte": 40}},
        {"start_date": {"$lte": "2024-06-01"}}
    ]
}

docs = db.retrieve_docs("budget summary", filters=filters, k=5)

Returns

  • List[DocumentResult]: List of document results

Examples

  • Sync
  • Async
from morphik import Morphik

db = Morphik()

docs = db.retrieve_docs(
    "machine learning",
    k=5,
    min_score=0.5
)

for doc in docs:
    print(f"Score: {doc.score}")
    print(f"Document ID: {doc.document_id}")
    print(f"Metadata: {doc.metadata}")
    print(f"Content: {doc.content}")
    print("---")

DocumentResult Properties

The DocumentResult objects returned by this method have the following properties:
  • score (float): Relevance score
  • document_id (str): Document ID
  • metadata (Dict[str, Any]): Document metadata
  • content (DocumentContent): Document content or URL