Benchmark Optimization Loop

Use this skill to convert "make it 20x faster" or "try 50 recursive optimizations" into a bounded measured loop that can actually improve a system.

Required Baseline

Do not optimize until these exist:

the operation being optimized;
the correctness gate that must stay green;
the metric: wall time, p95 latency, rows/sec, cost/run, memory, error rate;
the current baseline;
the search budget: max variants, max time, max spend, max data impact.

If the user asks for an unrealistic target, keep the ambition but make the loop bounded and measurable.

Loop

Measure the baseline.
Identify bottlenecks from evidence.
Generate variants that test one hypothesis each.
Run variants with the same input shape.
Reject variants that fail correctness, safety, or reproducibility.
Promote the fastest safe variant.
Codify the winning path in a script, command, test, config, or doc.
Rerun the baseline and winner to confirm the delta.

Variant Table

Track variants like this:

Variant | Hypothesis | Command | Time | Correct? | Notes
baseline | current path | npm run job | 120s | yes | stable
batch-500 | fewer round trips | npm run job -- --batch 500 | 42s | yes | winner
parallel-8 | more workers | npm run job -- --workers 8 | 31s | no | rate limited

Recursive Search

For recursive or hyperparameter work:

persist every run to a ledger;
compare against the prior accepted winner, not only the previous run;
keep a holdout or replay check;
stop when improvement is within noise, correctness fails, cost exceeds the budget, or the search starts changing more variables than it can explain.

Use phrases like "best measured safe variant" instead of "global optimum" unless the search space was actually exhaustive.

Promotion Gate

A variant cannot become the new default until:

correctness tests pass;
the performance delta is repeated or explained;
rollback is obvious;
the change is encoded in source control or a durable runbook;
the final summary includes exact commands and measurements.

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

87/100

Grade

A

Excellent

Safety

92

Quality

85

Clarity

88

Completeness

80

Summary

This skill guides agents through a structured benchmark-and-optimize loop: measure a baseline, identify bottlenecks, generate and test variants, promote winners, and encode the result. It provides clear promotion gates (correctness, reproducibility, rollback clarity) and recursive search guidelines to prevent unbounded optimization loops.

Detected Capabilities

file readbash executionvariant measurement and comparisoncorrectness gate validationledger tracking for recursive search

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

make fasterperformance optimizationbenchmark variantslatency reductionrecursive hyperparameter searchcost optimization loopthroughput improvement

Use Cases

Optimize application latency by testing batch size, parallelism, and caching variants
Reduce infrastructure cost through systematic benchmarking of different compute configurations
Improve database query throughput by iteratively testing indexes, query rewrites, and connection pooling
Recursive hyperparameter tuning with measured comparisons against prior winners
Performance regression detection and root-cause analysis using variant tables and baselines

Quality Notes

Excellent scope clarity: requires baseline, correctness gate, metric, budget, and evidence before optimization begins
Well-structured loop with promotion gates that enforce correctness, reproducibility, and rollback traceability
Variant table format and ledger pattern provide clear tracking and prevent unconstrained search drift
Recursive search guidelines prevent unbounded optimization by mandating budget limits and noise thresholds
Good use of terminology (e.g., 'best measured safe variant' vs. 'global optimum') to calibrate expectations
Practical examples (batch size, workers, rate limiting) ground abstract optimization concepts
Promotion gate explicitly requires source control/runbook encoding, ensuring results are durable
Minor: could include example commands for running a simple variant sweep or template for a ledger file

Model: claude-haiku-4-5-20251001Analyzed: May 25, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

benchmark-optimization-loop