Skip to main content
  • Sync
  • Async
def ingest_file(
    file: Union[str, bytes, BinaryIO, Path],
    filename: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
    use_colpali: bool = True,
) -> Document

Parameters

  • file (Union[str, bytes, BinaryIO, Path]): File to ingest (path string, bytes, file object, or Path)
  • filename (str, optional): Name of the file
  • metadata (Dict[str, Any], optional): Optional metadata dictionary
  • use_colpali (bool, optional): Whether to use ColPali-style embedding model to ingest the file (slower, but significantly better retrieval accuracy for images). Defaults to True.

Typed Metadata

Use Python-native values (e.g., datetime, date, Decimal) in the metadata dict. The SDK serializes them and adds the corresponding metadata_types, so you can run the advanced filters documented in Metadata Filtering.

Returns

  • Document: Metadata of the ingested document

Examples

  • Sync
  • Async
from morphik import Morphik

db = Morphik()

doc = db.ingest_file(
    "document.pdf",
    filename="document.pdf",
    metadata={"category": "research", "owner": "alice"},
    use_colpali=True,
)