Morphik comes with built-in support for running both embeddings and completions locally, ensuring your data never leaves your machine. Choose between two powerful local inference engines:
  • Lemonade SDK - Windows-only, optimized for AMD GPUs and NPUs
  • Ollama - Cross-platform (Windows, macOS, Linux), supports various hardware
Both are pre-configured in Morphik and can be selected through the UI or configuration file.

Why Local Inference?

Running models locally provides several key advantages:
  • Complete Privacy: Your data never leaves your machine
  • No API Costs: Eliminate ongoing API expenses
  • Low Latency: No network round-trips for inference
  • Offline Capability: Work without internet connectivity
  • Hardware Acceleration: Leverage your local GPU, NPU, or specialized AI processors
AMD🍋

Lemonade SDK - Windows Only

Run embeddings & completions locally with AMD GPU/NPU acceleration

Lemonade SDK provides high-performance local inference on Windows, with optimizations for AMD hardware. It exposes an OpenAI-compatible API and is already configured in Morphik.
Built-in Support: Lemonade models are pre-configured in morphik.toml for both embeddings and completions. Simply install Lemonade Server and select the models in the UI.

System Requirements

  • Windows 10/11 only (x86/x64)
  • 8GB+ RAM (16GB recommended)
  • Python 3.10+
  • Optional but recommended:
    • AMD Ryzen AI 300 series (NPU acceleration)
    • AMD Radeon 7000/9000 series (GPU acceleration)

Quick Start

1

Install Lemonade SDK

Command Line Installation (Recommended):
# Install with pip (Python 3.10+ required)
pip install lemonade-sdk[llm-oga]

# For AMD GPU/NPU optimization (Windows)
pip install lemonade-sdk[oga-ryzenai] --extra-index-url=https://pypi.amd.com/simple
Alternative: Windows GUI InstallerIf you prefer a GUI installer on Windows, download Lemonade_Server_Installer.exe from the Lemonade releases page.
2

Start Lemonade Server

# Start the server with large context window (important for RAG)
lemonade-server-dev server --port 8020 --ctx-size 100000
The --ctx-size 100000 parameter is crucial for RAG applications to handle large document contexts.
The server exposes an OpenAI-compatible API at http://localhost:8020/api/v1
3

Configure Morphik - Two Options

  1. Open Morphik UI and navigate to Settings
  2. Click “Add Custom Model”
  3. Configure as shown:
Add Lemonade Model in UIExample configuration:
{
  "model": "openai/Qwen2.5-VL-7B-Instruct-GGUF",
  "api_base": "http://host.docker.internal:8020/api/v1",
  "vision": true
}

Option 2: Edit morphik.toml

Morphik comes with pre-configured Lemonade models. Check your morphik.toml:
# Lemonade models (already configured)
lemonade_qwen = { 
  model_name = "openai/Qwen2.5-VL-7B-Instruct-GGUF", 
  api_base = "http://localhost:8020/api/v1", 
  vision = true 
}
lemonade_embedding = { 
  model_name = "openai/nomic-embed-text-v1-GGUF", 
  api_base = "http://localhost:8020/api/v1" 
}

# To use Lemonade as default, update these sections:
[completion]
model = "lemonade_qwen"

[embedding]
model = "lemonade_embedding"
When running Morphik in Docker, change localhost to host.docker.internal in the api_base URLs.
4

Download and Use Models

Once configured, you can:
  1. Select Lemonade models in the UI chat interface
  2. Download models as needed:
lemonade pull Qwen2.5-VL-7B-Instruct-GGUF
lemonade pull nomic-embed-text-v1-GGUF
  1. Start using Morphik with local inference!

Supported Models

Lemonade supports a wide range of models including:
  • Vision Models: Qwen2.5-VL series (7B, 14B)
  • Text Models: Llama, Mistral, Phi, Qwen families
  • Embeddings: nomic-embed-text, BGE models

Performance Tips

  • Model Quantization: Use GGUF quantized models for better performance
  • Hardware Acceleration: Automatically detects and uses AMD GPUs/NPUs when available
  • Memory Management: Models are cached after first download

Troubleshooting