How can I improve retrieval accuracy?

Morphik Documentation

What is Morphik?

Getting Started

Get the most out of Morphik by customizing it to your needs

Configure Morphik

An overview of Retrieval Augmented Generation with Vector Similarity Search

Introduction to RAG

Intelligent agentic query processing with autonomous tool usage

Morphik Agent

Organizing data with user and folder scoping in Morphik

User and Folder Scoping

An overview of rules based ingestion in Morphik

Rules

Using Late-interaction and Contrastive learning to achieve state-of-the-art performance in visual retrieval

Retrieving Images

Leveraging graph-based relationships for improved context and retrieval in RAG systems

Knowledge Graphs and Graph RAG

Learn how to use the Morphik UI interface

Morphik UI

Enable Claude and other AI assistants to access your Morphik knowledge base

Model Context Protocol (MCP)

Learn how to use the Morphik Python SDK or REST API

Code

Practical examples of using the Morphik Agent for complex analysis tasks

Agent Workflows

Learn how to provision and manage isolated Morphik application environments.

Morphik Management API

Thanking our amazing users for pointing out bugs, requesting features, and being early adopters ❤️

Special Thanks

What I Learnt From Vibe-Coding an Open-Source Alternative to ChatGPT's New Memory Feature

Vibe-Coding Memory

Drowning in Discoveries? How LLMs (and Morphik) Are Learning to Read Science

LLM Science Battle

A technical exploration of why even natively multimodal LLMs struggle with diagram interpretation in documents

When Multimodal Models Go Blind

Morphik’s cache-augmented generation (CAG) gives large language models a memory upgrade, making them 10× faster than traditional RAG by storing long-term context in the transformer key-value cache.

Cache-Augmented Generation – Teaching an AI to Remember for Lightning-Fast Answers

Simple health check endpoint that returns 200 OK.

Ping Health

Get list of available models from configuration.

Returns models grouped by type (chat, embedding, etc.) with their metadata.

Get Available Models

Retrieve relevant chunks.

Args:
    request: RetrieveRequest containing:
        - query: Search query text
        - filters: Optional metadata filters
        - k: Number of results (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - folder_name: Optional folder to scope the search to
        - end_user_id: Optional end-user ID to scope the search to
    auth: Authentication context

Returns:
    List[ChunkResult]: List of relevant chunks

Retrieve Chunks

Retrieve relevant documents.

Args:
    request: RetrieveRequest containing:
        - query: Search query text
        - filters: Optional metadata filters
        - k: Number of results (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - folder_name: Optional folder to scope the search to
        - end_user_id: Optional end-user ID to scope the search to
    auth: Authentication context

Returns:
    List[DocumentResult]: List of relevant documents

Retrieve Documents

Retrieve multiple documents by their IDs in a single batch operation.

Args:
    request: Dictionary containing:
        - document_ids: List of document IDs to retrieve
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context

Returns:
    List[Document]: List of documents matching the IDs

Batch Get Documents

Retrieve specific chunks by their document ID and chunk number in a single batch operation.

Args:
    request: Dictionary containing:
        - sources: List of ChunkSource objects (with document_id and chunk_number)
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
        - use_colpali: Whether to use ColPali-style embedding
    auth: Authentication context

Returns:
    List[ChunkResult]: List of chunk results

Batch Get Chunks

Generate completion using relevant chunks as context.

When graph_name is provided, the query will leverage the knowledge graph
to enhance retrieval by finding relevant entities and their connected documents.

Args:
    request: CompletionQueryRequest containing:
        - query: Query text
        - filters: Optional metadata filters
        - k: Number of chunks to use as context (default: 4)
        - min_score: Minimum similarity threshold (default: 0.0)
        - max_tokens: Maximum tokens in completion
        - temperature: Model temperature
        - use_reranking: Whether to use reranking
        - use_colpali: Whether to use ColPali-style embedding model
        - graph_name: Optional name of the graph to use for knowledge graph-enhanced retrieval
        - hop_depth: Number of relationship hops to traverse in the graph (1-3)
        - include_paths: Whether to include relationship paths in the response
        - prompt_overrides: Optional customizations for entity extraction, resolution, and query prompts
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
        - schema: Optional schema for structured output
        - chat_id: Optional chat conversation identifier for maintaining history
    auth: Authentication context

Returns:
    CompletionResponse: Generated text completion or structured output

Query Completion

Retrieve the message history for a chat conversation.

Args:
    chat_id: Identifier of the conversation whose history should be loaded.
    auth: Authentication context used to verify access to the conversation.
    redis: Redis connection where chat messages are stored.

Returns:
    A list of :class:`ChatMessage` objects or an empty list if no history
    exists.

Get Chat History

Get list of available models for UI selection.

Returns a list of models that can be used for queries. Each model includes:
- id: Model identifier to use in llm_config
- name: Display name for the model
- provider: The LLM provider (e.g., openai, anthropic, ollama)
- description: Optional description of the model

Get Available Models For Selection

Execute an agent-style query using the :class:`MorphikAgent`.

Args:
    request: The query payload containing the natural language question and optional chat_id.
    auth: Authentication context used to enforce limits and access control.
    redis: Redis connection for chat history storage.

Returns:
    A dictionary with the agent's full response.

Agent Query

List accessible documents.

Args:
    auth: Authentication context
    skip: Number of documents to skip
    limit: Maximum number of documents to return
    filters: Optional metadata filters
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    List[Document]: List of accessible documents

List Documents

Retrieve a single document by its external identifier.

Args:
    document_id: External ID of the document to fetch.
    auth: Authentication context used to verify access rights.

Returns:
    The :class:`Document` metadata if found.

Get Document

Delete a document and all associated data.

This endpoint deletes a document and all its associated data, including:
- Document metadata
- Document content in storage
- Document chunks and embeddings in vector store

Args:
    document_id: ID of the document to delete
    auth: Authentication context (must have write access to the document)

Returns:
    Deletion status

Delete Document

Get the processing status of a document.

Args:
    document_id: ID of the document to check
    auth: Authentication context

Returns:
    Dict containing status information for the document

Get Document Status

Get document by filename.

Args:
    filename: Filename of the document to retrieve
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    Document: Document metadata if found and accessible

Get Document By Filename

Get a download URL for a specific document.

Args:
    document_id: External ID of the document
    auth: Authentication context
    expires_in: URL expiration time in seconds (default: 1 hour)

Returns:
    Dictionary containing the download URL and metadata

Get Document Download Url

Download the actual file content for a document.
This endpoint is used for local storage when file:// URLs cannot be accessed by browsers.

Args:
    document_id: External ID of the document
    auth: Authentication context

Returns:
    StreamingResponse with the file content

Download Document File

Update a document with new text content using the specified strategy.

Args:
    document_id: ID of the document to update
    request: Text content and metadata for the update
    update_strategy: Strategy for updating the document (default: 'add')
    auth: Authentication context

Returns:
    Document: Updated document metadata

Update Document Text

Update a document with content from a file using the specified strategy.

Args:
    document_id: ID of the document to update
    file: File to add to the document
    metadata: JSON string of metadata to merge with existing metadata
    rules: JSON string of rules to apply to the content
    update_strategy: Strategy for updating the document (default: 'add')
    use_colpali: Whether to use multi-vector embedding
    auth: Authentication context

Returns:
    Document: Updated document metadata

Update Document File

Update only a document's metadata.

Args:
    document_id: ID of the document to update
    metadata: New metadata to merge with existing metadata
    auth: Authentication context

Returns:
    Document: Updated document metadata

Update Document Metadata

Get usage statistics for the authenticated user.

Args:
    auth: Authentication context identifying the caller.

Returns:
    A mapping of operation types to token usage counts.

Get Usage Stats

Retrieve recent telemetry records for the user or application.

Args:
    auth: Authentication context; admin users receive global records.
    operation_type: Optional operation type to filter by.
    since: Only return records newer than this timestamp.
    status: Optional status filter (e.g. ``success`` or ``error``).

Returns:
    A list of usage entries sorted by timestamp, each represented as a
    dictionary.

Get Recent Usage

Create a persistent cache for low-latency completions.

Args:
    name: Unique identifier for the cache.
    model: The model name to use when generating completions.
    gguf_file: Path to the ``gguf`` weights file to load.
    filters: Optional metadata filters used to select documents.
    docs: Explicit list of document IDs to include in the cache.
    auth: Authentication context used for permission checks.

Returns:
    A dictionary describing the created cache.

Create Cache

Retrieve information about a specific cache.

Args:
    name: Name of the cache to inspect.
    auth: Authentication context used to authorize the request.

Returns:
    A dictionary with a boolean ``exists`` field indicating whether the
    cache is loaded.

Get Cache

Refresh an existing cache with newly available documents.

Args:
    name: Identifier of the cache to update.
    auth: Authentication context used for permission checks.

Returns:
    A dictionary indicating whether any documents were added.

Update Cache

Manually add documents to an existing cache.

Args:
    name: Name of the target cache.
    docs: List of document IDs to insert.
    auth: Authentication context used for authorization.

Returns:
    A dictionary indicating whether the documents were queued for addition.

Add Docs To Cache

Generate a completion using a pre-populated cache.

Args:
    name: Name of the cache to query.
    query: Prompt text to send to the model.
    max_tokens: Optional maximum number of tokens to generate.
    temperature: Optional sampling temperature for the model.
    auth: Authentication context for permission checks.

Returns:
    A :class:`CompletionResponse` object containing the model output.

Query Cache

Create a new graph based on document contents.

The graph is created asynchronously. A stub graph record is returned with
``status = "processing"`` while a background task extracts entities and
relationships.

Args:
    request: Graph creation parameters including name and optional filters.
    auth: Authentication context authorizing the operation.

Returns:
    The placeholder :class:`Graph` object which clients can poll for status.

Create Graph

List all folders the user has access to.

Args:
    auth: Authentication context

Returns:
    List[Folder]: List of folders

List Folders

Create a new folder.

Args:
    folder_create: Folder creation request containing name and optional description
    auth: Authentication context

Returns:
    Folder: Created folder

Create Folder

Return compact folder list (id, name, doc_count, updated_at).

List Folder Summaries

Get a folder by ID.

Args:
    folder_id: ID of the folder
    auth: Authentication context

Returns:
    Folder: Folder if found and accessible

Get Folder

Delete a folder and all associated documents.

Args:
    folder_name: Name of the folder to delete
    auth: Authentication context (must have write access to the folder)

Returns:
    Deletion status

Delete Folder

Add a document to a folder.

Args:
    folder_id: ID of the folder
    document_id: ID of the document
    auth: Authentication context

Returns:
    Success status

Add Document To Folder

Remove a document from a folder.

Args:
    folder_id: ID of the folder
    document_id: ID of the document
    auth: Authentication context

Returns:
    Success status

Remove Document From Folder

Get a graph by name.

This endpoint retrieves a graph by its name if the user has access to it.

Args:
    name: Name of the graph to retrieve
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    Graph: The requested graph object

Get Graph

List all graphs the user has access to.

This endpoint retrieves all graphs the user has access to.

Args:
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    List[Graph]: List of graph objects

List Graphs

Get graph visualization data.

This endpoint retrieves the nodes and links data needed for graph visualization.
It works with both local and API-based graph services.

Args:
    name: Name of the graph to visualize
    auth: Authentication context
    folder_name: Optional folder to scope the operation to
    end_user_id: Optional end-user ID to scope the operation to

Returns:
    Dict: Visualization data containing nodes and links arrays

Get Graph Visualization

Update an existing graph with new documents.

This endpoint processes additional documents based on the original graph filters
and/or new filters/document IDs, extracts entities and relationships, and
updates the graph with new information.

Args:
    name: Name of the graph to update
    request: UpdateGraphRequest containing:
        - additional_filters: Optional additional metadata filters to determine which new documents to include
        - additional_documents: Optional list of additional document IDs to include
        - prompt_overrides: Optional customizations for entity extraction and resolution prompts
        - folder_name: Optional folder to scope the operation to
        - end_user_id: Optional end-user ID to scope the operation to
    auth: Authentication context

Returns:
    Graph: The updated graph object

Update Graph

Check the status of a graph build/update workflow.

This endpoint polls the external graph API to check the status of an async operation.

Args:
    workflow_id: The workflow ID returned from build/update operations
    run_id: Optional run ID for the specific workflow run
    auth: Authentication context

Returns:
    Dict containing status ('running', 'completed', or 'failed') and optional result

Check Workflow Status

Generate a development URI for running Morphik locally.

Args:
    name: Developer name to embed in the token payload.
    expiry_days: Number of days the generated token should remain valid.

Returns:
    A dictionary containing the ``uri`` that can be used to connect to the
    local instance.

Generate Local Uri

Generate an authenticated URI for a cloud-hosted Morphik application.

Args:
    request: Parameters for URI generation including ``app_id`` and ``name``.
    authorization: Bearer token of the user requesting the URI.

Returns:
    A dictionary with the generated ``uri`` and associated ``app_id``.

Generate Cloud Uri

Set extraction rules for a folder.

Args:
    folder_id: ID of the folder to set rules for
    request: SetFolderRuleRequest containing metadata extraction rules
    auth: Authentication context
    apply_to_existing: Whether to apply rules to existing documents in the folder

Returns:
    Success status with processing results

Set Folder Rule

Delete all resources associated with a given cloud application.

Args:
    app_name: Name of the application whose data should be removed.
    auth: Authentication context of the requesting user.

Returns:
    A summary describing how many documents and folders were removed.

Delete Cloud App

List chat conversations available to the current user.

Args:
    auth: Authentication context containing user and app identifiers.
    limit: Maximum number of conversations to return (1-500)

Returns:
    A list of dictionaries describing each conversation, ordered by most
    recent activity.

List Chat Conversations

Save Model

List all custom models for the authenticated user.

List Custom Models

Delete Model

List all configured API keys (sanitized).

List Api Keys

Save Api Key

Ingest a **text** document.

Args:
    request: IngestTextRequest payload containing:
        • content – raw text to ingest.
        • filename – optional filename to help detect MIME-type.
        • metadata – optional JSON metadata dict.
        • rules – optional list of extraction / NL rules.
        • folder_name – optional folder scope.
        • end_user_id – optional end-user scope.
    auth: Decoded JWT context (injected).

Returns:
    Document metadata row representing the newly-ingested text.

Ingest Text

Ingest a **file** asynchronously.

The file is uploaded to object storage, a *Document* stub is persisted
with ``status='processing'`` and a background worker picks up the heavy
parsing / chunking work.

Args:
    file: Uploaded file from multipart/form-data.
    metadata: JSON-string representing user metadata.
    rules: JSON-string with extraction / NL rules list.
    auth: Caller context – must include *write* permission.
    use_colpali: Switch to multi-vector embeddings.
    folder_name: Optionally scope doc to a folder.
    end_user_id: Optionally scope doc to an end-user.
    redis: arq redis connection – used to enqueue the job.

Returns:
    Document stub with ``status='processing'``.

Ingest File

Batch ingest **multiple files** (async).

Each file is treated the same as :func:`ingest_file` but sharing the same
request avoids many round-trips. All heavy work is still delegated to the
background worker pool.

Args:
    files: List of files to upload.
    metadata: Either a single JSON-string dict or list of dicts matching
        the number of files.
    rules: Either a single rules list or list-of-lists per file.
    use_colpali: Enable multi-vector embeddings.
    folder_name: Optional folder scoping for **all** files.
    end_user_id: Optional end-user scoping for **all** files.
    auth: Caller context with *write* permission.
    redis: arq redis connection to enqueue jobs.

Returns:
    BatchIngestResponse summarising created documents & errors.

Batch Ingest Files

Retrieve the message history for a document chat conversation.

Args:
    chat_id: Identifier of the document chat conversation.
    auth: Authentication context used to verify access to the conversation.
    redis: Redis connection where chat messages are stored.

Returns:
    A list of message dictionaries or an empty list if no history exists.

Get Document Chat History

Stream a chat completion response for a document chat conversation.

Args:
    chat_id: Identifier of the document chat conversation.
    request: The chat request containing the user message.
    auth: Authentication context.
    redis: Redis connection for chat history storage.

Returns:
    StreamingResponse with the assistant's response.

Complete Document Chat

List all model configurations for the authenticated user and app.

List Model Configs

Create Model Config

Get Model Config

Update Model Config

Delete Model Config

Create Custom Model

Generate a cloud URI for *request.app_id* owned by the calling user.

The *user_id* is derived from the bearer token.  The caller can therefore
not create applications for *other* users unless their token carries the
``admin`` permission (mirroring the community behaviour).

Create App

Provision a **brand-new** Neon database for *request.app_name*.

The caller must be authenticated to the dedicated Morphik instance.  The
authenticated user (represented by the JWT's *user_id*) becomes the owner
of the provisioned app.

Create App Route

Destroy the Neon project and metadata associated with *app_name*.

Only the owner of the application (identified via *auth.user_id*) may
perform this destructive action.

Delete App Route

Checks the current authentication status for the given connector type.

Get Auth Status For Connector

Return the provider's *authorization_url* for the given connector.

The method mirrors the logic of the `/auth/initiate` endpoint but sends a
JSON payload instead of a redirect so that browsers can stay on the same
origin until they intentionally navigate away.

For OAuth-based connectors, this returns authorization_url.
For manual credential connectors, this returns the credential form specification.

Get Initiate Auth Url

Handles the OAuth 2.0 callback from the authentication provider.
Validates state, finalizes authentication, and stores credentials.

Connector Oauth Callback

Finalize authentication using manual credentials.

This endpoint is used for connectors that require manual credential input
(like Zotero) instead of OAuth flows.

Finalize Manual Auth

Lists files and folders from the specified connector.

List Files For Connector

Downloads a file from the connector and ingests it into Morphik via DocumentService.

Ingest File From Connector

Disconnects the user from the specified connector by removing stored credentials.

FAQs

​Related questions

Related questions