Walk into most IT operations teams and you'll find the same informal org chart. There's the official reporting structure, and then there's the real one — the map of who actually knows things. Two or three engineers who can be traced to the resolution of almost every non-trivial incident. Everyone knows who they are. Tickets are quietly routed to them. When they're on vacation, things slow down.
The instinctive management response to this pattern is cultural: we need to improve knowledge sharing. Sometimes this turns into a documentation initiative. A Confluence space gets created, engineers are encouraged to write things down, and after a few months the space has forty pages, twelve of which are drafts and nine of which describe systems that have since been replaced. The two senior engineers are still getting pinged.
The framing of "knowledge hoarding" implies individual failure — that the engineers holding all the answers are deliberately withholding them. In most cases, this is wrong. The structural reality is simpler: there is no reliable mechanism by which anyone else can find the answer without asking one of them.
What "knowledge hoarding" actually looks like structurally
When we talk to IT leads at growing organizations, the pattern they describe has consistent features. Incidents get resolved, and the resolution involves a senior engineer doing some combination of SSH access, runbook consultation, and institutional memory — knowledge about why a particular component behaves a particular way, acquired over years and never written down anywhere searchable.
Post-incident documentation exists in theory. In practice, when an incident resolves at 11 PM, the last thing anyone does is write a thorough post-mortem in Confluence. A Slack thread captures the key steps, the ticket gets closed, and the knowledge of what was done and why stays in the heads of the two or three people who were in the thread.
This isn't a character flaw. It's a rational response to a system where writing things down has high friction and low perceived payoff. If the documentation will be hard to find anyway, why spend forty-five minutes writing it?
The retrieval problem, not the documentation problem
Most knowledge-sharing interventions focus on documentation creation: wikis, playbooks, post-mortem templates, "runbook Fridays." These are useful, but they treat the supply side of the problem. The demand side — whether someone can find and use the documentation that does exist — is underinvested.
Consider what a junior IT engineer actually needs when they hit an unfamiliar error. They need an answer that is specific to their environment, not generic documentation. They need it fast enough to be useful during an active incident. And they need to be able to find it without knowing what it's called or where it lives in the documentation hierarchy.
That last point is the structural barrier. An IT operations wiki built over several years with multiple contributors has no single consistent taxonomy. Pages are named after the people who created them, or after the incident that prompted them, or after a tool version that's since been superseded. A junior engineer searching for "Kafka consumer lag alert remediation" may not know that the relevant runbook is in a space called "Platform Engineering," under a section called "Messaging Infrastructure," in a page titled "MQ Monitoring Playbook v2" that was last updated eighteen months ago.
The senior engineers who "hoard" knowledge are effectively compensating for this retrieval failure. They've internalized the mental index of where everything lives, and they've become the search interface for everyone else. The problem isn't that they know too much — it's that the formal knowledge system is too hard to navigate for people who don't already know the answer.
When the two senior engineers leave
The fragility of this arrangement becomes visible at departure. When a senior engineer with five years of institutional knowledge gives notice, the team typically begins a frantic knowledge transfer exercise. They'll schedule a series of sessions where the departing engineer talks through everything they know, a junior engineer takes notes, and those notes become a set of Confluence pages that represent a partial, rapidly-fading snapshot of what the senior engineer knew.
This is expensive, incomplete, and structurally temporary. The notes from a knowledge transfer session are useful for six months; after that, they fall out of date and the team is in the same position they were before, just with slightly more documentation that nobody maintains.
At a mid-size technology company in 2024, an infrastructure team of eight saw two of their three most senior SREs leave within the same quarter. The tickets that had previously resolved in under an hour were now taking three to four hours because junior engineers couldn't locate the right procedures, escalated to the one remaining senior, or escalated further to engineers from adjacent teams who had partial knowledge. The same answers existed — in Confluence, in old Jira comments, in a Google Doc shared in a Slack channel in 2022 — but they were unreachable to the people who needed them.
The structural fix: index what already exists
The permanent answer to knowledge hoarding isn't writing more documentation (though that helps at the margin). It's making the documentation that already exists retrievable under natural-language queries, without requiring the searcher to already know the taxonomy.
This is what a semantic retrieval layer over existing sources does. The runbook titled "MQ Monitoring Playbook v2" becomes findable when someone asks "what do I do when Kafka consumer lag is spiking?" even though the query and the document title share no vocabulary. The answer to the AWS access request question surfaces even if the documentation is split between a Confluence page, a Jira template, and a Google Doc in a shared team drive.
The critical enabler is permission-aware retrieval. In IT environments, not all documentation should be accessible to all engineers. Access credentials, security incident logs, and sensitive infrastructure details may be restricted to senior roles. Any retrieval system that federates across multiple source systems must respect the ACLs of those source systems — returning only content the querying user is already authorized to see. A query from a junior engineer should not surface a page they couldn't navigate to directly in Confluence.
The documentation incentive changes when retrieval improves
There's a compounding effect worth noting. When engineers know that what they write will be retrievable — that it will actually get used — the incentive to document changes. The deterrent to writing post-mortems and runbooks isn't laziness; it's the rational belief that nobody will find what you write. Remove that deterrent and documentation quality tends to improve incrementally, not from a policy mandate but from changed expectations about usefulness.
We're not saying knowledge hoarding vanishes once retrieval improves. Some knowledge is genuinely tacit — judgment calls, contextual reasoning, pattern matching from years of experience that doesn't reduce to a document. Senior engineers will always be valuable for that category of problem. The goal is to eliminate the retrievable-but-unfound category: questions that have written answers somewhere, that are being answered by senior engineers instead because the formal system can't surface them.
Closing that gap won't restructure your org chart. But it will change the texture of what your senior engineers spend their time on — less answering repetitive reference questions, more working on the problems that actually require their judgment.
Continue reading
Building a Runbook That's Actually Searchable →