Search isn't the hard part

March 12, 2026

The week-one version

Your team builds semantic search in a week. Embed your documents, store the vectors, cosine similarity against a query. It works. Users type a question, they get back a passage. You ship it.

Then someone asks: “What have I been thinking about this month?” Or: “What am I not seeing across everything I’ve saved?” And the search bar just sits there. It can find a document if you know what you’re looking for. It can’t tell you what you don’t know to look for.

A few users have described this to me as wanting to see their blind spots. They’ve accumulated hundreds of captures — meeting transcripts, saved articles, highlights, journal entries — and they can feel that there’s structure in there. Recurring themes. Threads they keep pulling on without realizing it. Connections between something they wrote in January and something they saved last week. The search you built can’t surface any of it, because search answers queries. Nobody knows the query for “show me the pattern I haven’t noticed.”

What sits between search and meaning

The gap between “we have search” and “our users see what their content means” is a preprocessing layer. It runs before anyone asks a question. It looks at the full corpus and builds a thematic profile — the conceptual structure of what someone’s been accumulating.

This is the part that takes months. The components:

Entity extraction. Tags, links, folders, recurring references — whatever structure exists in the content, explicit or implicit. These become semantic clusters. A user who keeps saving articles about decision-making under uncertainty and also journaling about a career change has two entities that the system needs to recognize as related, even if the user never connected them.

Thematic clustering. Entities on their own are just labels. The preprocessing layer generates thematic handles from them — questions, tensions, claims that probe what the content is actually about. A tag like #productivity in the context of someone’s journal entries about parenthood produces something more specific than either word alone. The handle has to come from the content’s own vocabulary, not from a generic topic model.

Precomputed similarity. At query time, you don’t want to be computing relationships from scratch. The preprocessing layer builds a similarity map: every thematic handle scored against every document, offline, during indexing. Search at query time becomes a lookup into precomputed structure. That’s how you get 8ms responses on a user’s full corpus without a GPU.

Domain configuration. A meeting transcript app and a reading highlights app have different content shapes. What counts as an entity, how thematic handles should be generated, what temporal weighting to apply — these vary by product. The preprocessing layer needs a configuration surface that tunes profiling to the content type.

Each of these is a real engineering project. Together they’re 2–6 months of work, and then you maintain them. The team that built search in a week discovers that search was the easy part.

Where this runs matters

Once you have a preprocessing layer that builds thematic profiles over user content, the deployment question becomes pointed. This layer holds the most longitudinal, personal data your product touches — what someone’s been reading, writing, thinking about over months. The profile it builds is a map of someone’s intellectual life.

For products where this matters — companion apps, journal tools, meeting intelligence, anything health-adjacent — compliance teams and users both care where that data lives. “Your data never leaves your infrastructure” is a requirement, not a feature. Legal teams are wary of sending user content to a third-party memory service for profiling. Users increasingly ask.

The architecture that satisfies both is embedded. The preprocessing can run on your infra. Each user’s index is a local file. Queries can stay inside your boundary without introducing a third-party memory service into the hot path.

What Enzyme ships

Enzyme is the preprocessing layer described above, packaged as a single ~30MB Rust binary with the embedding model compiled in. Runs on CPU, no GPU.

What it does: entity extraction, thematic catalyst generation, precomputed similarity scoring, and a domain configuration layer that auto-tunes to your content shape. Each user’s index is a SQLite file and an embeddings binary — a few MB per user.

In the local engine, queries run against precomputed artifacts after indexing. Semantic search is fast because the LLM work happens during catalyst generation, not at query time. For product teams, the cost and deployment profile depends on provider choice, refresh cadence, corpus size, and where the index runs.

It integrates as an SDK. Init, refresh on content update, query when you need context.

The question this answers

A user opens your app and asks “what am I not seeing?” Your product can either return ten keyword matches or show them the thematic structure of six months of accumulated material — the threads they keep returning to, the connections between captures they made in different contexts, the blind spots in what they’ve been paying attention to.

The second version requires a preprocessing layer that most teams underestimate until they’re building it. If that’s where you are — search works, but meaning doesn’t — let’s talk. If you want to try the engine on your own notes first, start with the setup guide.

Read next: When worse embeddings give better results — the engineering behind catalyst-mediated retrieval, and why Enzyme can use a small local embedding layer instead of a large query-time model.