The Operator's Handbook
How to turn an AI into an operator that decides instead of chats
From The Quiet AI. A short, practical guide — built from two operators that ship real decisions: Nightwatch (on-call triage) and Claimwise (claims adjudication). Both are free and ungated.
The one distinction
A chatbot responds. An operator decides, then routes the work.
Paste a problem into a chatbot and it hands the problem back to you, dressed up: "here's what I found — what would you like to do?" Paste the same problem into an operator and you come back to find the work already done, or one clearly-flagged thing waiting for the single judgment that was genuinely yours to make.
Everything in this handbook serves that one difference. The skill you're building — writing decision logic into rules a stranger can trust — is the foundation of any real automation, and it's the same skill whether the operator triages alerts, screens applications, or handles refunds.
You don't need a framework, an agent platform, or a single line of code. You need a folder.
The shape: folders as architecture
An operator is a folder of plain Markdown you drop into a Claude Project. Five things, each doing one job:
operator-name/
├── identity.md # who it is, the one workflow it owns, what it refuses
├── rules.md # the decision logic — the heart
├── examples.md # worked decisions, including the edge cases
├── reference/ # the lookup tables, checklists, templates it consults
└── README.md # how a stranger uses it
The structure is the interpretability. Anyone can open the folder and see exactly where the logic lives, read it, and change it. Nothing is hidden in a prompt or buried in a model. That legibility is the whole point — it's what lets someone trust the output enough to act on it.
1. identity.md —
scope before logic
Before any rule, decide what the operator owns and, just as importantly, what it refuses.
The trap is breadth. "A customer support operator" is too big to be trustworthy. "A first-pass refund-triage operator for a store under 200 orders a month" is a job you can actually encode. Narrow scope is what makes the edge cases finite and the decisions consistent.
State three things:
- The one workflow, written as a pipeline:
input → decision → routed output. - The fixed set of outcomes — every input leaves as exactly one of them. (Nightwatch: PAGE / TICKET / SUPPRESS / FLAG. Claimwise: APPROVE / REQUEST / DENY / ESCALATE.)
- What's out of bounds — the things it hands to a human on sight (payments, legal, anything irreversible).
And write the rule the operator holds about itself: it decides; it does not ask the user what to do. That one sentence, stated plainly in identity, does more work than any clever prompt.
2. rules.md — the heart
This is where most operators are won or lost. Three principles:
a) "Use good judgment" is not a rule. A rule has a threshold you could check by hand. Not "page if it's serious" but "page if the error rate is at or above 5% on a core endpoint, sustained for five minutes or more." Not "deny old claims" but "deny if the incident date falls outside the 90-day window." If you can't write the threshold, you haven't finished thinking — and the operator will wobble in exactly the spot you left vague.
b) Make it a short-circuiting flow. Order the checks so the dangerous and the obvious resolve first, and let the first step that produces an outcome win. Put the asymmetric-risk gates at the very top — the things whose downside is severe if missed: a security signal, a fraud pattern, a data-loss risk. They fire before any of the normal logic runs, so a low number can never talk the operator out of escalating a real danger.
c) Decide on the deciding field, not the obvious one. This is where lived expertise shows, and where most tools quietly fail:
- Nightwatch computes severity from business impact, not the alert's own label — so a global 0.2% blip that happens to be 100% of your one SLA customer outranks a "CRITICAL" cosmetic warning.
- Claimwise judges a claim by its cause and incident date, not its symptom and filing date — so a "cracked screen" is judged by how it cracked, and a claim filed late is judged by when the device actually failed.
The obvious field — the label, the symptom, the filing date — is usually a trap. Name the field that actually decides the right answer, and route on that.
3.
examples.md — show the judgment, not the format
Two or three worked decisions, at least one a genuine edge case. The straightforward one proves the happy path; the edge cases prove there's judgment underneath. An edge case is any input where the obvious rule points one way and the right call points the other — the late-filed-but-covered claim, the scary-looking alert that's actually known noise. Show the operator reaching the non-obvious but correct answer, and show its reasoning, so a reader can audit how it thinks.
4.
reference/ — the operator is only as correct as these
Pull the lookup tables out of the rules and into their own files: the severity matrix, the routing table, the coverage policy, the fraud signals, the response templates. Two reasons:
- The rules stay readable — logic in
rules.md, data inreference/. - A stranger can adapt it without touching the logic — they edit their services, their SLA customers, their policy, and the operator now runs their world.
Say this plainly in the README: the templates ship with example data, and the half-hour of customization is where the operator learns the user's environment.
5. The escalation discipline — decide, don't dump
Escalating is not the same as kicking the question back. The difference is everything:
- A kickback is "I don't know, you handle it." It's a chatbot wearing an operator's clothes.
- An escalation is "here's my read, here's the safe default I'm holding, and here's the one specific thing that's genuinely yours to decide."
Escalate rarely, and only on real judgment calls —
money over your authority, a fraud signal to verify, a true policy
ambiguity. Everything else, the operator should resolve. And when it's
uncertain, it shouldn't default to asking — it should default
to the safe action. For anything worsening that it
can't size, that usually means escalating loudly; for a plausible
covered claim, it means leaning to approve or asking one precise
question. If you find yourself escalating often, your
reference/ is missing a rule. Fix the file, not the
habit.
6. Trust: cite the why, gate the irreversible
Two habits make an operator's output trustworthy enough to act on:
- Every decision carries a one-line "why" — the rule and the facts that drove it. A human should be able to read it and either nod or override in five seconds. No black box.
- Keep the human gate where it earns its keep. Automate the routine, reversible decisions; hold a checkpoint for the live, the irreversible, the brand-facing. The goal isn't to remove the human — it's to spend their attention only where it matters.
7. Test it like a stranger
Before you trust it, feed it three real cases it hasn't seen — including an edge case — and check that it decides, cites a why you agree with, and only asks for your judgment on the genuinely undecidable.
We do this with a fresh model that has nothing but the folder — no context, none of what we meant, only what we actually wrote. On Claimwise, that test caught two gaps in our own policy file: an exclusion we'd under-specified, and a date rule that was too vague to act on consistently. We fixed both. The gate isn't theater. It finds real things, because the cold reader only has what's on the page, and that's exactly where bugs hide.
If it hands every decision back to you, you've built a chatbot. Go
back to rules.md and write the thresholds.
Now build your own
That's the whole method. Steal the structure from Nightwatch or Claimwise — both are free, ungated, and built to be forked — pick a workflow you'd actually use, and write the decision logic for your world.
New operators ship from The Quiet AI as they're built. This handbook comes with them.