Why We Use MongoDB in Production at Morphik

Morphik builds accurate retrieval over complex documents—diagrams, CAD, schematics, dense PDFs—by leaning hard into vision and layout understanding. MongoDB is the storage and persistence layer that makes that feel like a simple, organized “file system” for our customers instead of a pile of vectors and blobs behind the scenes.

In practice, many teams use Morphik to power a “knowledge base” or “documents” feature inside their own product: their users log in, upload files, organize them, and expect fast, permission-aware search. Morphik provides the retrieval and AI layer; MongoDB gives us a clean way to represent that whole structure.

Metadata wants to be documents

In Morphik, customers don’t just upload files into a bucket. They create structured knowledge spaces that look a lot like a filesystem:

Vaults / workspaces to separate environments, clients, or projects
Folders to group related documents
Documents with versions, approvals, and change history
Pages and chunks that our models actually retrieve over

Each of those layers carries metadata: tags, owners, versions, languages, permissions, and any customer-specific fields they care about.

All of that is naturally JSON. In MongoDB, that metadata fits directly into documents—vault, folder, document, page, chunk—with no awkward mapping, no complex joins, and no custom query DSL. A “document” in Morphik looks and behaves like a document in MongoDB, which makes it easy to evolve our schema as customers add new tags or change their workflows.

We also run Postgres for other workflows, but to get comparable filtering there we built a small operator “compiler” to translate product queries into JSONB expressions. MongoDB ships those operators out of the box, so we spend more time on retrieval quality and UX, and less time on query plumbing.

Narrowing the search space

Retrieval quality comes from a mix of good modeling (chunking, embeddings, rerankers), good metadata, and sensible constraints on how much data you search over for each query.

For workloads with a lot of data and many concurrent users, narrowing the search space using metadata and permissions before you touch embeddings is one of the main levers for latency, cost, and system simplicity.

When someone queries Morphik—“Show me the latest signed revision,” or “Where did we approve this change?”—we usually know which vaults, folders, or document versions they’re allowed to see and which ones are relevant.

MongoDB’s query operators ($and, $or, $not, $in, etc.) let us:

Filter on metadata and permissions before vector search
Enforce per-user and per-group access control at query time
Route queries to the right subset of a tenant’s knowledge base

We lean on MongoDB’s query planner and indexes instead of re-implementing filtering logic in application code. That keeps the retrieval pipeline easier to reason about and reduces the amount of custom infrastructure we have to own. The engineering time goes into the parts that are specific to Morphik: visual-grounded retrieval, page-level understanding, and agent workflows over procedures, drawings, and spec sheets.

Tenant isolation without heavy lifting

Morphik is multi-tenant, but most of our customers want strong isolation between tenants, and often between environments inside a tenant.

We run a separate database per enterprise tenant. In MongoDB Atlas, provisioning a new customer is essentially a createDb away. On top of that, we can subdivide into vaults and workspaces inside that database without giving up the isolation boundary.

That gives us:

Strong data isolation for each tenant
Cleaner compliance and audit stories
The ability to tune performance per customer as they grow (indexes, hardware, backup policies, and more)

MongoDB handles operators, indexing, replication, and isolation, so we don’t have to build and maintain that stack ourselves. That’s the real reason we use it in production: it keeps the storage layer boring while we iterate quickly on retrieval and agent behavior.

Why We Use MongoDB in Production at Morphik

Metadata wants to be documents

Narrowing the search space

Tenant isolation without heavy lifting

Read more from MongoDB

Ready to Transform Your Knowledge Management?

Related Posts

The Hidden $92 Billion Crisis: How Manufacturing's Knowledge Problem Blocks Growth (And How to Solve It)

Morphik’s 2025 Ultimate List of 10 AI Tools for Technical Documentation

Explore More

📚 Documentation

🔧 Solutions