The Infrastructure Layer: Getting Out of People's Heads
Tribal knowledge is a liability disguised as expertise. Here's how to build the SOPs, playbooks, and documentation that let your company scale without depending on any single person.
Every company has a Sarah. She's the one who knows how returns actually get processed. She knows which vendor needs a follow-up call on Thursdays. She knows the workaround for that one integration that breaks every time someone updates a shipping address.
Sarah is incredible. Sarah is also a single point of failure.
The tribal knowledge trap
Tribal knowledge is institutional memory that lives exclusively in people's heads. It accumulates naturally — someone figures out a workaround, shares it verbally, and it becomes "the way we do things." Nobody writes it down because everyone who needs to know already knows.
Until they don't. Sarah goes on vacation and returns processing stops. A new hire spends three weeks discovering things that could have been explained in thirty minutes. A team lead quits and takes six months of operational context with them.
This isn't a people problem. It's an infrastructure problem. And it's the second pillar of the operating system framework for a reason — it only works after Architecture is in place.
Why infrastructure comes second
In the previous post, I made the case that decision frameworks, accountability structures, and performance cadences need to be established before anything else. Infrastructure is what comes next.
The reason for the sequence is practical: you can't document processes that haven't been defined. If nobody has clarified who owns a decision, writing an SOP for that decision is a waste of time — the SOP will describe one person's interpretation of a process that three other people do differently.
Architecture tells you what needs to happen and who is responsible. Infrastructure tells you how it happens, step by step, in a way that anyone can follow.
What infrastructure actually looks like
Infrastructure isn't a Notion workspace full of pages nobody reads. It's a living system of three components:
SOPs — the repeatable steps
A Standard Operating Procedure is a step-by-step guide for a recurring task. Not a strategy document. Not a vision statement. A sequence of actions that produces a consistent outcome every time.
Good SOPs share a few characteristics:
- ▸They're specific enough to follow. "Process the return" is not an SOP. "Log into Shopify, navigate to Orders, locate the order number, click Refund, select the return reason from the dropdown, confirm the refund amount, click Refund" — that's an SOP.
- ▸They include the edge cases. The value of documentation isn't capturing the happy path — anyone can figure that out. It's capturing what to do when the customer ordered with a gift card, or when the item was part of a bundle, or when the shipping label was already generated.
- ▸They name an owner. Every SOP has one person responsible for keeping it current. Not a team. A person.
Playbooks — the judgment calls
Playbooks are for situations that require context and decision-making. An SOP tells you exactly what to click. A playbook tells you how to think about a situation.
Onboarding a new enterprise client is a playbook, not an SOP. There are steps, but each engagement is different — the playbook captures the principles, the checkpoints, the common mistakes, and the escalation triggers. It gives someone 80% of the context they'd need years of experience to accumulate.
Escalation paths — the safety net
When something goes wrong and it's not covered by an SOP or playbook, people need to know where it goes. Escalation paths define the chain: who gets notified, at what threshold, with what information, and what the expected response time is.
Without escalation paths, problems either get solved by whoever happens to be around (inconsistent) or they get escalated to the founder by default (bottleneck). Both patterns break at scale.
How to write an SOP in 30 minutes
Most companies never document their processes because it feels like a massive project. It doesn't have to be. Here's the approach I use:
Step 1: Pick the task that would hurt most if the person doing it disappeared tomorrow. Don't start with everything. Start with the one workflow that's most trapped in someone's head.
Step 2: Have that person do the task while narrating. Record the screen or take notes. Don't ask them to write it up later — capture it in real time, because they'll skip steps they consider obvious.
Step 3: Write the steps as numbered actions. Each step starts with a verb. "Open," "Navigate to," "Enter," "Select," "Verify." If a step requires a decision, note the criteria. If a step has an exception, note the exception.
Step 4: Have someone else follow the SOP without help. This is the test. If they can complete the task using only the document, it works. If they get stuck, the SOP has a gap. Fill the gap.
Step 5: Assign an owner and a review date. SOPs decay. Interfaces change, policies update, edge cases emerge. Every SOP needs a single owner who reviews it quarterly — not to rewrite it, but to confirm it's still accurate.
Thirty minutes for the first draft. Another thirty after the test run. That's it. One hour to eliminate a single point of failure.
The compounding effect
Infrastructure has a compounding return that most operators underestimate.
Hiring gets faster. When processes are documented, onboarding drops from months to weeks. New hires aren't learning by osmosis — they're following playbooks and SOPs from day one. The question shifts from "when will they figure it out" to "when will they finish the onboarding checklist."
Quality gets consistent. When everyone follows the same process, output variance drops. Customer experience stops depending on which team member happens to handle the request. This is the difference between a company that scales and a company that just gets bigger.
Delegation becomes possible. Founders can't delegate what isn't defined. "Handle customer escalations" is an impossible mandate without an escalation playbook. "Follow this escalation path and use your judgment at Step 4" — that's delegatable.
Acceleration becomes viable. This is the bridge to the third pillar. Once a process is documented, standardized, and consistent, you can evaluate it for automation. You can see where AI might help. You can't automate tribal knowledge — but you can automate an SOP.
The resistance
Every company I've worked with has pushed back on documentation at some point. The objections are always the same:
"We move too fast to document things." You move too fast not to. Undocumented processes break silently. You won't know until a customer complains or a team member burns out.
"Our business is too unique for templates." Your business is unique. Your processes aren't. Every company onboards customers, handles escalations, processes transactions, and manages vendors. The details differ. The structure doesn't.
"Nobody will read it." Then your documentation is in the wrong place or the wrong format. SOPs should live where the work happens — in the tool, not in a wiki three clicks away. And they should be short enough to scan, not long enough to impress.
Start with five
You don't need to document everything. Start with five SOPs — the five processes that are most dependent on tribal knowledge, most critical to operations, and most likely to break if the wrong person calls in sick.
Document those five. Test them. Assign owners. Then do five more.
Infrastructure isn't a project with a finish line. It's a practice. But it starts with getting the first five processes out of people's heads and into a system that anyone can follow.
Your company's most valuable knowledge shouldn't live in anyone's brain. It should live in a system — documented, tested, and owned.
> Next steps
Ready to see where your operations stand? Score your company across all three pillars — or skip straight to a conversation.
> Comments
Loading...