Catalog
affaan-m/enterprise-agent-ops

affaan-m

enterprise-agent-ops

Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.

global
New~286
v1.1Saved May 11, 2026

Enterprise Agent Ops

Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.

Operational Domains

  1. runtime lifecycle (start, pause, stop, restart)
  2. observability (logs, metrics, traces)
  3. safety controls (scopes, permissions, kill switches)
  4. change management (rollout, rollback, audit)

Baseline Controls

  • immutable deployment artifacts
  • least-privilege credentials
  • environment-level secret injection
  • hard timeout and retry budgets
  • audit log for high-risk actions

Metrics to Track

  • success rate
  • mean retries per task
  • time to recovery
  • cost per successful task
  • failure class distribution

Incident Pattern

When failure spikes:

  1. freeze new rollout
  2. capture representative traces
  3. isolate failing route
  4. patch with smallest safe change
  5. run regression + security checks
  6. resume gradually

Deployment Integrations

This skill pairs with:

  • PM2 workflows
  • systemd services
  • container orchestrators
  • CI/CD gates
Files1
1 files · 1.0 KB

Select a file to preview

Overall Score

48/100

Grade

C

Adequate

Safety

55

Quality

35

Clarity

58

Completeness

38

Summary

This skill provides operational guidance for long-lived, cloud-hosted agent workloads, emphasizing lifecycle management, observability, and safety controls. It outlines deployment patterns, incident response procedures, and best practices for running agents at enterprise scale, but lacks concrete implementation examples and tactical guidance for executing these operations.

Detected Capabilities

agent lifecycle managementlogging and metrics collectionenvironment-based secret injectiondeployment artifact managementincident response and rollbackaudit loggingtimeout and retry budget configuration

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

manage long-lived agentsagent lifecycle controlproduction observability setupincident response workflowagent deployment integrationoperational monitoring

Risk Signals

WARNING

High-level guidance without concrete implementation steps — agent lacks tactical instructions for executing lifecycle commands or configuring observability systems

Operational Domains, Baseline Controls, Incident Pattern sections
WARNING

Missing credential management specifics — 'least-privilege credentials' and 'environment-level secret injection' are mentioned but not detailed or demonstrated

Baseline Controls section
INFO

Incomplete incident procedure — 'isolate failing route' and 'patch with smallest safe change' lack concrete execution guidance

Incident Pattern section
WARNING

No documented guardrails or safety boundaries for agent actions — skill describes safety concepts but does not show how to implement kill switches or permission scoping

Safety Controls subsection
INFO

Deployment integration section lists tool names but provides no integration examples or workflows

Deployment Integrations section

Use Cases

  • Monitor and manage continuously running agent systems in production
  • Implement lifecycle controls (start, pause, stop, restart) for long-lived workloads
  • Establish observability practices (logs, metrics, traces) for agent operations
  • Execute incident response and rollback procedures when agent failures occur
  • Integrate agents with PM2, systemd, or container orchestration platforms
  • Track operational metrics to optimize agent performance and cost efficiency

Quality Notes

  • Skill provides high-level operational framework but lacks concrete implementation guidance — an agent cannot execute these instructions without supplementary documentation or code examples
  • Good conceptual organization: domains are clearly named (lifecycle, observability, safety, change management) and incident response is logically structured
  • Metrics section is well-chosen for operational assessment but lacks guidance on how to instrument or report these metrics
  • Missing tactical examples: no sample PM2 configs, systemd service definitions, or code snippets for timeout/retry budgets
  • No error handling patterns or edge cases documented (e.g., what happens if a freeze command fails, how to verify rollback success)
  • Security baseline is well-intentioned but underdeveloped: terms like 'immutable deployment artifacts' and 'hard timeout' are not explained in sufficient detail for implementation
Model: claude-haiku-4-5-20251001Analyzed: May 11, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest
v1.0

Seeded from github.com/affaan-m/everything-claude-code

2026-03-16

Add affaan-m/enterprise-agent-ops to your library

Command Palette

Search for a command to run...