Yeachan-Heo/autoresearch

<Use_When>

You already have a mission and evaluator from /deep-interview --autoresearch
You want persistent single-mission improvement with strict evaluation
You need durable experiment logs under .omc/autoresearch/
You want a supported path for periodic reruns via Claude Code native cron </Use_When>

<Do_Not_Use_When>

You need evaluator generation at runtime — use /deep-interview --autoresearch first
You need multiple missions orchestrated together — v1 forbids that
You want the deprecated omc autoresearch CLI flow — it is no longer authoritative </Do_Not_Use_When>

<Required_Artifacts> Canonical persistent storage lives under .omc/autoresearch/<mission-slug>/ and/or .omc/logs/autoresearch/<run-id>/.

Minimum required artifacts:

mission spec
evaluator script or command reference
per-iteration evaluation JSON
markdown decision logs

Recommended canonical shape:

.omc/autoresearch/<mission-slug>/
  mission.md
  evaluator.json
  runs/<run-id>/
    evaluations/
      iteration-0001.json
      iteration-0002.json
    decision-log.md

Reuse existing runtime artifacts when available rather than duplicating them unnecessarily. </Required_Artifacts>

<Cron_Integration> Claude Code native cron is a supported integration point for periodic mission enhancement. In v1, prefer documenting/configuring cron inputs over building a large scheduler UI.

If cron is used:

keep one mission per scheduled job
preserve the same mission/evaluator contract
append new run artifacts rather than overwriting prior experiments </Cron_Integration>

<Execution_Policy>

Do not hand execution back to omc autoresearch
Do not create multi-mission orchestration
Prefer reusing src/autoresearch/* runtime/schema helpers where they already match the stricter contract
Keep logs useful to humans, not only machines </Execution_Policy>

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

72/100

Grade

B

Good

Safety

78

Quality

68

Clarity

75

Completeness

62

Summary

Autoresearch is a stateful skill that orchestrates bounded, single-mission iterative improvement loops with strict evaluator contracts. It maintains durable experiment logs in markdown and JSON formats, enforces max-runtime stop conditions, and supports periodic reruns via native cron scheduling.

Detected Capabilities

State management for iterative missions (mission slug, evaluator reference, iteration tracking)File I/O for mission specs, evaluator definitions, and evaluation resultsMarkdown and JSON artifact generation (decision logs, evaluation records)Runtime lifecycle management (start, resume, stop conditions)Timestamp tracking and run-id bookkeepingCron integration for periodic scheduling

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

iterative improvement loopsingle-mission experimentevaluator-driven optimizationbounded experiment runresume experiment checkpoint

Risk Signals

WARNING

Skill references external evaluator scripts/commands without specifying how they are sourced or validated

Contract section: 'evaluator script or command reference'

INFO

File write operations to `.omc/autoresearch/` and `.omc/logs/` directories without explicit permission guards documented

Required_Artifacts and Workflow sections

INFO

Max-runtime parameter described as 'primary strict stop hook' but no specification of duration format or validation logic provided

Argument-hint and Execution_Policy sections

WARNING

Resume capability via `--resume <run-id>` referenced but no guardrails documented against resuming stale or compromised run states

Argument-hint and Workflow sections

Use Cases

Run iterative improvement experiments on a single well-defined mission with structured evaluation
Maintain durable experiment logs and decision records for reproducibility and audit trails
Integrate periodic mission enhancement with native cron scheduling
Resume interrupted improvement runs from checkpoint state

Quality Notes

Positive: Clear contract boundaries — skill explicitly forbids multi-mission orchestration and multi-evaluator scenarios, reducing complexity.
Positive: Well-structured artifact model with canonical directory layout and versioned run artifacts supports reproducibility and audit trails.
Positive: Markdown decision logs paired with JSON evaluation records balance human readability and machine parsability.
Positive: Explicit `Do_Not_Use_When` section sets clear expectations about dependencies (requires prior `deep-interview --autoresearch` setup).
Negative: No specification of evaluator JSON schema validation — the skill requires 'boolean pass and optional numeric score' but does not document error handling for malformed evaluator output.
Negative: Max-runtime duration format is not specified (e.g., ISO 8601, seconds, human-readable like '2h30m'). The skill should document expected input format and validation.
Negative: No documented rollback or conflict resolution strategy when resuming runs — what happens if the mission spec or evaluator has changed since the last run?
Negative: Cron integration section is aspirational ('prefer documenting') rather than prescriptive — unclear how cron inputs map to CLI arguments or how scheduling state is persisted.
Negative: No guidance on handling evaluator timeouts, crashes, or partial failures — the skill says 'continue even when evaluation does not pass' but does not address when evaluation itself fails.
Negative: Missing examples of decision-log markdown format and iteration JSON schema — agents would benefit from concrete templates.

Model: claude-haiku-4-5-20251001Analyzed: Apr 20, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

autoresearch