4 min read

Why We Use MongoDB in Production at Morphik

MongoDB keeps Morphik’s multi-tenant knowledge bases simple to model and fast to query.

mongodb
architecture
multi-tenant
vector-search
metadata

By Morphik Team

Morphik builds accurate retrieval over complex documents—diagrams, CAD, schematics, dense PDFs—by leaning hard into vision and layout understanding. MongoDB is the storage and persistence layer that makes that feel like a simple, organized “file system” for our customers instead of a pile of vectors and blobs behind the scenes.

In practice, many teams use Morphik to power a “knowledge base” or “documents” feature inside their own product: their users log in, upload files, organize them, and expect fast, permission-aware search. Morphik provides the retrieval and AI layer; MongoDB gives us a clean way to represent that whole structure.

Metadata wants to be documents

In Morphik, customers don’t just upload files into a bucket. They create structured knowledge spaces that look a lot like a filesystem:

  • Vaults / workspaces to separate environments, clients, or projects
  • Folders to group related documents
  • Documents with versions, approvals, and change history
  • Pages and chunks that our models actually retrieve over

Each of those layers carries metadata: tags, owners, versions, languages, permissions, and any customer-specific fields they care about.

All of that is naturally JSON. In MongoDB, that metadata fits directly into documents—vault, folder, document, page, chunk—with no awkward mapping, no complex joins, and no custom query DSL. A “document” in Morphik looks and behaves like a document in MongoDB, which makes it easy to evolve our schema as customers add new tags or change their workflows.

We also run Postgres for other workflows, but to get comparable filtering there we built a small operator “compiler” to translate product queries into JSONB expressions. MongoDB ships those operators out of the box, so we spend more time on retrieval quality and UX, and less time on query plumbing.

Narrowing the search space

Retrieval quality comes from a mix of good modeling (chunking, embeddings, rerankers), good metadata, and sensible constraints on how much data you search over for each query.

For workloads with a lot of data and many concurrent users, narrowing the search space using metadata and permissions before you touch embeddings is one of the main levers for latency, cost, and system simplicity.

When someone queries Morphik—“Show me the latest signed revision,” or “Where did we approve this change?”—we usually know which vaults, folders, or document versions they’re allowed to see and which ones are relevant.

MongoDB’s query operators ($and, $or, $not, $in, etc.) let us:

  • Filter on metadata and permissions before vector search
  • Enforce per-user and per-group access control at query time
  • Route queries to the right subset of a tenant’s knowledge base

We lean on MongoDB’s query planner and indexes instead of re-implementing filtering logic in application code. That keeps the retrieval pipeline easier to reason about and reduces the amount of custom infrastructure we have to own. The engineering time goes into the parts that are specific to Morphik: visual-grounded retrieval, page-level understanding, and agent workflows over procedures, drawings, and spec sheets.

Tenant isolation without heavy lifting

Morphik is multi-tenant, but most of our customers want strong isolation between tenants, and often between environments inside a tenant.

We run a separate database per enterprise tenant. In MongoDB Atlas, provisioning a new customer is essentially a createDb away. On top of that, we can subdivide into vaults and workspaces inside that database without giving up the isolation boundary.

That gives us:

  • Strong data isolation for each tenant
  • Cleaner compliance and audit stories
  • The ability to tune performance per customer as they grow (indexes, hardware, backup policies, and more)

MongoDB handles operators, indexing, replication, and isolation, so we don’t have to build and maintain that stack ourselves. That’s the real reason we use it in production: it keeps the storage layer boring while we iterate quickly on retrieval and agent behavior.

Read more from MongoDB

MongoDB wrote up a case study on how we’re using Morphik + MongoDB together. You can read it here: Enterprise-Level, Scalable AI with Morphik and MongoDB.

Ready to Transform Your Knowledge Management?

Join thousands of teams using Morphik to unlock insights from their documents and data.

Free tier available • No credit card required • Deploy in 2 minutes

Related Posts

11 min read

The Hidden $92 Billion Crisis: How Manufacturing's Knowledge Problem Blocks Growth (And How to Solve It)

Manufacturing loses $92 billion annually to knowledge management failures. Learn how manufacturers can achieve 216% ROI by solving their tribal knowledge crisis with AI-powered systems.

manufacturing
knowledge management
13 min read

Morphik’s 2025 Ultimate List of 10 AI Tools for Technical Documentation

Engineering teams lose an average of 5 hours weekly to manual document review, but AI-powered tools can cut this time by 70% while improving accuracy from 76% to 94%.

ai
knowledge-management

Explore More

📚 Documentation

Learn how to integrate Morphik into your workflow with our comprehensive guides.

🔧 Solutions

Discover how Morphik can be tailored to your industry and use case.