There's a moment most IT knowledge managers know well. Someone files a ticket asking how to reset a service account password. The answer exists — it's been in the wiki for years, in a page titled "Active Directory Service Account Management." But the person typed "reset AD password" into the internal search bar and got back three unrelated pages about password policies and a link to the HR offboarding checklist. So they filed a ticket instead.
That gap — between what someone typed and what they actually needed — is the gap that keyword search cannot close. It's not a failure of documentation. It's a fundamental limitation of how traditional full-text search works. And understanding why matters before you evaluate any internal search or knowledge-base product.
How keyword search actually works
Classical full-text search engines like those underlying most enterprise wikis — whether Confluence's built-in search, basic Elasticsearch configurations, or legacy SharePoint — operate on an inverted index. Every document in the corpus is tokenized: split into individual terms, stripped of stop words ("the", "a", "is"), often stemmed ("provisioning" → "provision"). The index maps each term to the documents containing it.
When you search, the engine scores documents by term frequency and inverse document frequency (TF-IDF) — how often your search terms appear in a given document, weighed against how rare those terms are across the whole corpus. The document that mentions your exact words most often ranks highest.
This works well for known-item searches: if you know a document is called "AD Service Account Management" and you type "service account management," you'll find it. But it breaks down for the far more common case where the searcher doesn't know what the answer is called. They just know what they're trying to do.
The conceptual gap semantic search closes
Semantic search approaches the problem differently. Instead of looking for lexical overlap, it encodes both the query and each document (or document chunk) into a high-dimensional vector space using a language model — typically a transformer-based embedding model. The position of a vector in that space reflects its meaning, not just its vocabulary.
Queries and documents that are conceptually similar land near each other in vector space, even when they share no words. "Reset service account credentials" and "rotate the password for an AD service account" end up geometrically close because they describe the same action. The search engine retrieves by cosine similarity — how close two vectors are — not by term matching.
In practice this means the system can answer questions like "how do I give a contractor temporary database access?" and surface the correct document even if that document uses entirely different phrasing: "Provisioning short-term read permissions for external parties in RDS." No shared vocabulary required.
Where keyword search wins — and we're not saying it should be abandoned
It's worth being precise here. We're not saying keyword search is bad, or that semantic search should simply replace it. Keyword search has genuine advantages in certain query patterns:
- Exact-match lookups. If someone types a specific ticket number (IT-4872), a specific error code (0x80070005), or a specific configuration value, exact matching outperforms semantic similarity. Embedding models may not reliably distinguish between error codes that are numerically close.
- High-precision terminology. Domain-specific acronyms and product names are often poorly handled by general-purpose embedding models unless the model has been fine-tuned on the domain. "CMDB," "ITSM," or an internal codename may sit in a vague region of embedding space.
- Performance at scale. Inverted index lookups are fast. For very large corpora where sub-100ms query latency matters, a pure vector search can be costly unless the index is optimized (HNSW graphs, IVF clustering, etc.).
The practical answer for most enterprise internal search use cases is a hybrid approach: BM25 or TF-IDF scoring blended with vector similarity, re-ranked by a cross-encoder model that jointly encodes the query and each candidate document to produce a more precise relevance score. This is what production-grade knowledge retrieval systems use.
A concrete case: the IT service desk scenario
Consider a growing technology company with about 300 employees and an IT team of six. Their wiki has roughly 800 pages covering everything from network topology to employee offboarding workflows. They've been using keyword search for four years.
The IT lead notices that about 40% of tickets are questions that already have written answers somewhere in the wiki. The pattern is consistent: the person searched, didn't find the right page, and filed a ticket. On closer inspection, the failures cluster around intent-vocabulary mismatch. The runbook pages were written by senior engineers who use precise technical language. The people searching are often junior employees or people from other departments — they use natural language, colloquial phrasing, and describe goals rather than procedures.
After switching to a semantic retrieval layer over the same wiki content, unmodified, the team sees a measurable drop in this ticket category within a few weeks. The underlying documents haven't changed. The same content that was invisible to keyword search is now discoverable because the retrieval model bridges the vocabulary gap.
This is the practical argument for semantic search in internal knowledge bases. It's not about the technology being sophisticated. It's about the gap between how expert documentation is written and how non-experts ask questions — a gap that is inherent to any large knowledge corpus and grows as the organization scales.
What semantic search cannot fix
There are failure modes that no retrieval improvement resolves. If the document doesn't exist, no search strategy finds it. If the document exists but is so poorly structured that the relevant information is buried in noise, retrieval may surface the document but the reader still can't extract the answer quickly. And if the document is actively wrong — containing outdated procedures, superseded configuration values, or deprecated workflows — semantic search will retrieve it accurately and the user will receive accurate retrieval of bad information.
This is why retrieval quality and content quality are separate problems that both require attention. A well-tuned semantic search layer over a stale, unmaintained wiki will retrieve stale answers faster. That's a different problem from retrieval failure, and it requires different solutions: content freshness signals, last-modified timestamps surfaced prominently in results, and workflows that prompt authors to review and update pages.
The intent-vocabulary gap as an organizational signal
One underappreciated value of moving to semantic or intent-aware retrieval is what the query logs tell you. When you can see what people actually searched for — in their own words, not the words of your documentation — you get a direct signal about where terminology is misaligned, where knowledge gaps exist, and where documentation is failing searchers.
A query like "how do I get my laptop on the corporate VPN" that consistently yields no useful result tells you the VPN setup runbook is either missing or written in a way that doesn't match how users think about the problem. That's an actionable signal for a knowledge manager. Keyword search logs give you the same queries, but without the semantic matching, you can't easily tell whether the failure is a retrieval problem or a content problem.
Understanding the distinction between how search works and what it can and can't solve is the prerequisite to making good tooling decisions. The technology has matured considerably — embedding models, hybrid BM25+vector pipelines, and cross-encoder re-ranking are all production-ready. The question is whether your internal knowledge infrastructure is positioned to take advantage of them.
Continue reading