Doubt-Driven Development

Overview

A confident answer is not a correct one. Long sessions accumulate context that quietly turns assumptions into "facts" without anyone noticing. Doubt-driven development is the discipline of materializing a fresh-context reviewer — biased to disprove, not approve — before any non-trivial output stands.

This is not /review. /review is a verdict on a finished artifact. This is an in-flight posture: non-trivial decisions get cross-examined while course-correction is still cheap.

When to Use

A decision is non-trivial when at least one of these is true:

It introduces or modifies branching logic
It crosses a module or service boundary
It asserts a property the type system or compiler cannot verify (thread safety, idempotence, ordering, invariants)
Its correctness depends on context the future reader cannot see
Its blast radius is irreversible (production deploy, data migration, public API change)

Apply the skill when:

About to make an architectural decision under uncertainty
About to commit non-trivial code
About to claim a non-obvious fact ("this is safe", "this scales", "this matches the spec")
Working in code you don't fully understand

When NOT to use:

Mechanical operations (renaming, formatting, file moves)
Following a clear, unambiguous user instruction
Reading or summarizing existing code
One-line changes with obvious correctness
Pure tooling operations (running tests, listing files)
The user has explicitly asked for speed over verification

If you doubt every keystroke, you ship nothing. The skill applies only to non-trivial decisions as defined above.

Loading Constraints

This skill is designed for the main-session orchestrator, where Step 3 (DOUBT, detailed below) can spawn a fresh-context reviewer.

Do NOT add this skill to a persona's skills: frontmatter. A persona that follows Step 3 would spawn another persona — the orchestration anti-pattern explicitly forbidden by references/orchestration-patterns.md ("personas do not invoke other personas").
If you find yourself applying this skill from inside a subagent context (where Claude Code prevents nested subagent spawn): the preferred path is to surface to the user that doubt-driven cannot run nested and let the main session handle it. As a last resort only, a degraded self-questioning fallback exists — rewrite ARTIFACT + CONTRACT as a fresh self-prompt with a hard mental separator from your prior reasoning, and walk Steps 1–5. This is not fresh-context review (you carry your own context with you), so flag the result as degraded and prefer escalation whenever the user is reachable.

The Process

Copy this checklist when applying the skill:

Doubt cycle:
- [ ] Step 1: CLAIM — wrote the claim + why-it-matters
- [ ] Step 2: EXTRACT — isolated artifact + contract, stripped reasoning
- [ ] Step 3: DOUBT — invoked fresh-context reviewer with adversarial prompt
- [ ] Step 4: RECONCILE — classified every finding against the artifact text
- [ ] Step 5: STOP — met stop condition (trivial findings, 3 cycles, or user override)

Step 1: CLAIM — Surface what stands

Name the decision in two or three lines:

CLAIM: "The new caching layer is thread-safe under the
        read-heavy workload described in the spec."
WHY THIS MATTERS: a race here corrupts user data and is
                  hard to detect in QA.

If you can't write the claim that compactly, you have a vibe, not a decision. Surface it before scrutinizing it.

Step 2: EXTRACT — Smallest reviewable unit

A fresh-context reviewer needs the artifact and the contract, not the journey.

Code: the diff or the function — not the whole file
Decision: the proposal in 3–5 sentences plus the constraints it has to satisfy
Assertion: the claim plus the evidence that supposedly supports it (kept distinct from the Step 1 CLAIM block, which is the orchestrator's hypothesis under scrutiny)

Strip your reasoning. If you hand over conclusions, you'll get back validation of your conclusions. The unit must be small enough that a reviewer can hold it in mind in one read — if it's a 500-line PR, decompose first.

Step 3: DOUBT — Invoke the fresh-context reviewer

The reviewer's prompt must be adversarial. Framing decides the answer.

Adversarial review. Find what is wrong with this artifact.
Assume the author is overconfident. Look for:
- Unstated assumptions
- Edge cases not handled
- Hidden coupling or shared state
- Ways the contract could be violated
- Existing conventions this might break
- Failure modes under unexpected input

Do NOT validate. Do NOT summarize. Find issues, or state
explicitly that you cannot find any after thorough examination.

ARTIFACT: <paste artifact>
CONTRACT: <paste contract>

Pass ARTIFACT + CONTRACT only. Do NOT pass the CLAIM. Handing the reviewer your conclusion biases it toward agreement. The reviewer must independently determine whether the artifact satisfies the contract.

In Claude Code, the role-based reviewers in agents/ start with isolated context by design and are usable here — see agents/ for the roster and per-domain match.

The adversarial prompt above takes precedence over the persona's default response shape. Personas like code-reviewer are written to produce balanced verdicts with both strengths and weaknesses; doubt-driven needs issues-only output. Paste the adversarial prompt verbatim into the invocation so it overrides the persona's default. If a persona's response shape can't be overridden cleanly, fall back to a generic subagent with the adversarial prompt.

Cross-model escalation

A single-model reviewer shares blind spots with the original author — a colder, different-architecture model catches them. Doubt-driven is already opt-in for non-trivial decisions, so within that scope offering cross-model is part of the skill's value, not optional friction.

Interactive sessions: always offer. Never silently skip.

Step 1: Ask the user

After the single-model review in Step 3 above, but before RECONCILE, pause and ask:

"Single-model review complete. Want a cross-model second opinion? Options: Gemini CLI, Codex CLI, manual external review (you paste it elsewhere), or skip."

This question is mandatory in every interactive doubt cycle — even on artifacts that feel low-stakes. The user — not the agent — decides whether the cost is worth it. The agent's job is to surface the choice.

Step 2: If the user picks a CLI — verify, then invoke

Check the tool is in PATH (which gemini, which codex).
Test it works (gemini --version or equivalent) before passing the full prompt — a stale or broken binary may pass which but fail on real input.
Confirm the exact invocation with the user, including required flags, auth, and env vars (e.g., API keys). Implementations vary; never assume.
Pass ARTIFACT + CONTRACT + the adversarial prompt only. No session context, no CLAIM.
Mind shell escaping. If the artifact contains quotes, $(...), or backticks, prefer stdin (echo … | gemini) or a heredoc over inline -p "…". When in doubt, ask the user to confirm the invocation before running it.
Take the output into Step 4 (RECONCILE).

Never interpolate the artifact into a shell-quoted argument. Code, markdown, and review prompts routinely contain backticks, $(...), and quote characters that will either truncate the prompt or execute embedded shell. Write the full prompt to a file and pipe it through stdin.

Example shapes (verify flags against your installed tool — syntax differs across implementations and versions):

# Write the adversarial prompt + ARTIFACT + CONTRACT to a temp file first.
# Then pipe via stdin so shell metacharacters in the artifact stay inert.

# Codex (read-only sandbox keeps the CLI from writing to your workspace):
codex exec --sandbox read-only -C <repo-path> - < /tmp/doubt-prompt.md

# Gemini ('--approval-mode plan' is read-only; '-p ""' triggers non-interactive
# mode and the prompt is read from stdin):
gemini --approval-mode plan -p "" < /tmp/doubt-prompt.md

A read-only sandbox is the load-bearing detail: a doubt artifact may itself contain instructions (intentional or accidental prompt injection) that the cross-model CLI would otherwise execute against your workspace.

Step 3: If the CLI is unavailable or fails

Surface the failure explicitly. Offer: run it manually, try a different tool, or skip. Do not silently fall back to single-model — the user should know cross-model didn't happen.

Step 4: If the user skips

Acknowledge the skip in the output ("Proceeding with single-model findings only") and continue to RECONCILE. Skipping is fine; silent skipping is not.

Non-interactive contexts (CI, /loop, autonomous-loop, scheduled runs):

Cross-model is skipped, and the skip must be announced in the output: "Cross-model skipped: non-interactive context."
Never invoke an external CLI without explicit user authorization — this is a load-bearing safety property.

Cross-model adds cost, latency, and tool fragility. The agent surfaces the choice every cycle; the user decides whether this artifact warrants it.

Step 4: RECONCILE — Fold findings back

The reviewer's output is data, not verdict. You are still the orchestrator. Re-read the artifact text against each finding before classifying — rubber-stamping the reviewer is the same failure mode as ignoring it.

For each finding, classify in this precedence order (first matching class wins):

Contract misread — reviewer flagged something specifically because the CONTRACT you provided was unclear or incomplete. Fix the contract first, re-classify on the next cycle.
Valid + actionable — real issue requiring a change to the artifact. Change it, re-loop.
Valid trade-off — issue is real but cost of fixing exceeds cost of accepting. Document the trade-off explicitly so the user sees it.
Noise — reviewer flagged something that's actually correct under context the reviewer didn't have. Note it, move on, and ask: would adding that context to the contract have prevented the false flag?

A fresh reviewer can be wrong because it lacks context. Don't defer just because it's "fresh."

Step 5: STOP — Bounded loop, not recursion

Stop when:

Next iteration returns only trivial or already-considered findings, or
3 cycles completed (escalate to user, don't grind a fourth alone), or
User explicitly says "ship it"

If after 3 cycles the reviewer still surfaces substantive issues, the artifact may not be ready. Surface this to the user — three unresolved cycles is information about the artifact, not a reason to keep looping.

If 3 cycles is "obviously insufficient" because the artifact is large: the artifact is too big — return to Step 2 and decompose. Do not lift the bound.

Common Rationalizations

Rationalization	Reality
"I'm confident, skip the doubt step"	Confidence correlates poorly with correctness on novel problems. Moments of certainty are exactly when blind spots hide.
"Spawning a reviewer is expensive"	Debugging a wrong commit in production is more expensive. The check is bounded; the bug isn't.
"The reviewer will just nitpick"	Only if unscoped. Constrain the prompt to "issues that would make this fail under the contract."
"I'll do doubt at the end with `/review`"	`/review` is a final gate. Doubt-driven catches wrong directions early when course-correction is cheap. By PR time it's too late.
"If I doubt every step I'll never ship"	The skill applies to non-trivial decisions, not every keystroke. Re-read "When NOT to Use."
"Two opinions are always better than one"	Not when the second has less context and produces noise. Reconcile, don't defer.
"The reviewer disagreed so I was wrong"	The reviewer lacks your context — disagreement is information, not verdict. Re-read the artifact, classify, then decide.
"Cross-model is always better"	Cross-model catches blind spots a single model shares with itself, but it adds cost and tool fragility. Offer it every interactive doubt cycle — the user decides whether the artifact warrants it. The agent's job is to surface the choice, not to gate it.
"User said yes once, so I can keep invoking the CLI"	Each invocation is its own authorization. The artifact, the prompt, and the flags change between calls — re-confirm the exact command with the user before every run.

Red Flags

Spawning a fresh-context reviewer for a one-line rename or formatting change
Treating reviewer output as authoritative without re-reading the artifact text
Looping >3 cycles without escalating to the user
Prompting the reviewer with "is this good?" instead of "find issues"
Skipping doubt under time pressure on a high-stakes decision
Re-spawning fresh-context on an unchanged artifact (you'll get the same findings; you're stalling)
Doubt theater (checkable signal): across 2 or more cycles where the reviewer surfaced substantive findings, zero findings were classified as actionable. You are validating, not doubting. Stop and escalate.
Doubting only after committing — that's /review, not doubt-driven development
Hardcoding an external CLI invocation without confirming with the user that the tool exists, is configured, and accepts that exact syntax
Silently skipping cross-model in an interactive doubt cycle. Even when not recommending it, the offer must be visible. Skipping is fine; silent skipping is not.
Falling back silently when an external CLI errors or is missing — surface the failure and let the user redirect
Stripping the contract from the reviewer's input
Passing the CLAIM to the reviewer (biases toward agreement)

Interaction with Other Skills

code-review-and-quality / /review: complementary. /review is post-hoc PR verdict; doubt-driven is in-flight per-decision. Use both.
source-driven-development: SDD verifies facts about frameworks against official docs. Doubt-driven verifies your reasoning about the artifact. SDD checks the API exists; doubt-driven checks you used it correctly under the contract.
test-driven-development: TDD's RED step is doubt made concrete — a failing test is a disproof attempt. When TDD applies, that failing test is the doubt step for behavioral claims.
debugging-and-error-recovery: when the reviewer surfaces a real failure mode, drop into the debugging skill to localize and fix.
Repo orchestration rules (references/orchestration-patterns.md): this skill orchestrates from the main session. A persona calling another persona is anti-pattern B — see Loading Constraints above.

Verification

After applying doubt-driven development:

Every non-trivial decision (per the definition above) was named explicitly as a CLAIM before standing
At least one fresh-context review per non-trivial artifact (a failing test produced by TDD's RED step satisfies this for behavioral claims, per Interaction with Other Skills)
The reviewer received ARTIFACT + CONTRACT — NOT the CLAIM, NOT your reasoning
The reviewer's prompt was adversarial ("find issues"), not validating ("is it good")
Findings were classified against the artifact text (not rubber-stamped) using the precedence: contract misread / actionable / trade-off / noise
A stop condition was met (trivial findings, 3 cycles, or user override)
In interactive mode, cross-model was explicitly offered to the user (regardless of artifact stakes) and the response was acknowledged in the output
In non-interactive mode, cross-model was skipped and the skip was announced
Any external CLI invocation was preceded by a PATH check, a working-binary test, syntax confirmation with the user, and explicit authorization to run

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

92/100

Grade

A

Excellent

Safety

94

Quality

93

Clarity

91

Completeness

88

Summary

Doubt-driven development is a decision-verification skill that materializes a fresh-context adversarial reviewer to scrutinize non-trivial decisions before they stand. It applies a bounded 5-step process (CLAIM, EXTRACT, DOUBT, RECONCILE, STOP) to catch blind spots early, with explicit guidance on when to invoke cross-model review, how to classify findings, and hard constraints on loop depth and orchestration patterns.

Detected Capabilities

Decision framing and claim articulationArtifact extraction and contract isolationAdversarial review prompt generation and orchestrationFinding classification and precedence-based triageCross-model CLI invocation with safety guardsBounded-loop management and escalation signalingFailure mode identification and trade-off documentation

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

verify decision correctnessadversarial code reviewcheck for hidden assumptionscatch failure modes earlycross-examine architecturevalidate before committingfresh-perspective review

Risk Signals

WARNING

Guidance on invoking external CLIs (Gemini, Codex) with artifact content

Step 3: DOUBT — Cross-model escalation section, subsection 'Step 2: If the user picks a CLI'

INFO

Shell escaping and heredoc/stdin usage to prevent metacharacter injection

Cross-model escalation section, subsection 'Step 2: If the user picks a CLI'

INFO

Requirement to verify external tool existence and functionality before invocation

Cross-model escalation section, subsection 'Step 2: If the user picks a CLI', steps 1-2

INFO

Explicit user authorization required for every external CLI invocation in interactive contexts

Cross-model escalation section, end of Step 2 and Verification checklist

WARNING

Red flag: hardcoding external CLI invocation without user confirmation

Red Flags section, near end

INFO

Guidance against prompt-injection attacks via artifact content (read-only sandbox recommendation)

Step 2: If the user picks a CLI — subsection describing sandbox flag usage

Use Cases

Verify correctness of architectural decisions under uncertainty before committing
Cross-examine branching logic or module-boundary changes before shipping
Challenge assumptions about thread safety, idempotence, or ordering invariants
Catch hidden coupling or failure modes in production-critical code
Validate non-obvious claims (scalability, spec compliance, safety) with fresh perspective

Quality Notes

Exceptional clarity: the 5-step process is tightly scoped, each step has a clear purpose and constraints, and the checklist is copyable and verifiable.
Comprehensive edge-case coverage: red flags section identifies 11 failure modes with actionable signals; rationalizations table directly addresses common misapplications.
Strong error handling: fallback paths for missing/broken CLI tools, degraded mode for nested contexts, explicit skip announcements for non-interactive contexts.
Excellent scope boundaries: 'When NOT to Use' section prevents skill overapplication; Loading Constraints explicitly forbid anti-patterns (persona-to-persona spawning).
Safety-conscious: adversarial prompt template prevents validator-bias; artifact-only handoff (no CLAIM) ensures reviewer independence; precedence order for finding classification prevents rubber-stamping.
Interactive vs. non-interactive handling is explicit and well-differentiated, with distinct requirements for authorization and skip announcement.
Verification checklist is thorough and spot-checkable, enabling users to confirm the skill was applied correctly.
Cross-skill interaction section clarifies relationship to `/review`, TDD, and debugging skill without redundancy.
One minor area for clarification: the degraded self-questioning fallback for nested subagent contexts (mentioned in Loading Constraints) is described conceptually but lacks the step-by-step detail provided elsewhere. However, this is appropriate since the primary pattern is escalation, not fallback execution.
Documentation avoids prescriptive enforcement of tool choice — users decide whether cross-model is worth the cost, aligning with the skill's emphasis on user agency in high-stakes decisions.

Model: claude-haiku-4-5-20251001Analyzed: May 9, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

doubt-driven-development

Doubt-Driven Development

Overview

When to Use

Loading Constraints

The Process

Step 1: CLAIM — Surface what stands

Step 2: EXTRACT — Smallest reviewable unit

Step 3: DOUBT — Invoke the fresh-context reviewer

Cross-model escalation

Step 4: RECONCILE — Fold findings back

Step 5: STOP — Bounded loop, not recursion

Common Rationalizations

Red Flags

Interaction with Other Skills

Verification

Summary

Detected Capabilities

Trigger Keywords

Risk Signals

Use Cases

Quality Notes

Reviews

Command Palette