Catalog
affaan-m/healthcare-eval-harness

affaan-m

healthcare-eval-harness

Patient safety evaluation harness for healthcare application deployments. Automated test suites for CDSS accuracy, PHI exposure, clinical workflow integrity, and integration compliance. Blocks deployments on safety failures.

global
New~1.9k
v1.1Saved May 11, 2026

Healthcare Eval Harness — Patient Safety Verification

Automated verification system for healthcare application deployments. A single CRITICAL failure blocks deployment. Patient safety is non-negotiable.

Note: Examples use Jest as the reference test runner. Adapt commands for your framework (Vitest, pytest, PHPUnit, etc.) — the test categories and pass thresholds are framework-agnostic.

When to Use

  • Before any deployment of EMR/EHR applications
  • After modifying CDSS logic (drug interactions, dose validation, scoring)
  • After changing database schemas that touch patient data
  • After modifying authentication or access control
  • During CI/CD pipeline configuration for healthcare apps
  • After resolving merge conflicts in clinical modules

How It Works

The eval harness runs five test categories in order. The first three (CDSS Accuracy, PHI Exposure, Data Integrity) are CRITICAL gates requiring 100% pass rate — a single failure blocks deployment. The remaining two (Clinical Workflow, Integration) are HIGH gates requiring 95%+ pass rate.

Each category maps to a Jest test path pattern. The CI pipeline runs CRITICAL gates with --bail (stop on first failure) and enforces coverage thresholds with --coverage --coverageThreshold.

Eval Categories

1. CDSS Accuracy (CRITICAL — 100% required)

Tests all clinical decision support logic: drug interaction pairs (both directions), dose validation rules, clinical scoring vs published specs, no false negatives, no silent failures.

npx jest --testPathPattern='tests/cdss' --bail --ci --coverage

2. PHI Exposure (CRITICAL — 100% required)

Tests for protected health information leaks: API error responses, console output, URL parameters, browser storage, cross-facility isolation, unauthenticated access, service role key absence.

npx jest --testPathPattern='tests/security/phi' --bail --ci

3. Data Integrity (CRITICAL — 100% required)

Tests clinical data safety: locked encounters, audit trail entries, cascade delete protection, concurrent edit handling, no orphaned records.

npx jest --testPathPattern='tests/data-integrity' --bail --ci

4. Clinical Workflow (HIGH — 95%+ required)

Tests end-to-end flows: encounter lifecycle, template rendering, medication sets, drug/diagnosis search, prescription PDF, red flag alerts.

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No clinical tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Clinical pass rate: ${rate}% ($passed/$total)"

5. Integration Compliance (HIGH — 95%+ required)

Tests external systems: HL7 message parsing (v2.x), FHIR validation, lab result mapping, malformed message handling.

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No integration tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Integration pass rate: ${rate}% ($passed/$total)"

Pass/Fail Matrix

Category Threshold On Failure
CDSS Accuracy 100% BLOCK deployment
PHI Exposure 100% BLOCK deployment
Data Integrity 100% BLOCK deployment
Clinical Workflow 95%+ WARN, allow with review
Integration 95%+ WARN, allow with review

CI/CD Integration

name: Healthcare Safety Gate
on: [push, pull_request]

jobs:
  safety-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci

      # CRITICAL gates — 100% required, bail on first failure
      - name: CDSS Accuracy
        run: npx jest --testPathPattern='tests/cdss' --bail --ci --coverage --coverageThreshold='{"global":{"branches":80,"functions":80,"lines":80}}'

      - name: PHI Exposure Check
        run: npx jest --testPathPattern='tests/security/phi' --bail --ci

      - name: Data Integrity
        run: npx jest --testPathPattern='tests/data-integrity' --bail --ci

      # HIGH gates — 95%+ required, custom threshold check
      # HIGH gates — 95%+ required
      - name: Clinical Workflows
        run: |
          TMP_JSON=$(mktemp)
          npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$TMP_JSON" || true
          TOTAL=$(jq '.numTotalTests // 0' "$TMP_JSON")
          PASSED=$(jq '.numPassedTests // 0' "$TMP_JSON")
          if [ "$TOTAL" -eq 0 ]; then
            echo "::error::No clinical tests found"; exit 1
          fi
          RATE=$(echo "scale=2; $PASSED * 100 / $TOTAL" | bc)
          echo "Pass rate: ${RATE}% ($PASSED/$TOTAL)"
          if (( $(echo "$RATE < 95" | bc -l) )); then
            echo "::warning::Clinical pass rate ${RATE}% below 95%"
          fi

      - name: Integration Compliance
        run: |
          TMP_JSON=$(mktemp)
          npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$TMP_JSON" || true
          TOTAL=$(jq '.numTotalTests // 0' "$TMP_JSON")
          PASSED=$(jq '.numPassedTests // 0' "$TMP_JSON")
          if [ "$TOTAL" -eq 0 ]; then
            echo "::error::No integration tests found"; exit 1
          fi
          RATE=$(echo "scale=2; $PASSED * 100 / $TOTAL" | bc)
          echo "Pass rate: ${RATE}% ($PASSED/$TOTAL)"
          if (( $(echo "$RATE < 95" | bc -l) )); then
            echo "::warning::Integration pass rate ${RATE}% below 95%"
          fi

Anti-Patterns

  • Skipping CDSS tests "because they passed last time"
  • Setting CRITICAL thresholds below 100%
  • Using --no-bail on CRITICAL test suites
  • Mocking the CDSS engine in integration tests (must test real logic)
  • Allowing deployments when safety gate is red
  • Running tests without --coverage on CDSS suites

Examples

Example 1: Run All Critical Gates Locally

npx jest --testPathPattern='tests/cdss' --bail --ci --coverage && \
npx jest --testPathPattern='tests/security/phi' --bail --ci && \
npx jest --testPathPattern='tests/data-integrity' --bail --ci

Example 2: Check HIGH Gate Pass Rate

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
jq '{
  passed: (.numPassedTests // 0),
  total: (.numTotalTests // 0),
  rate: (if (.numTotalTests // 0) == 0 then 0 else ((.numPassedTests // 0) / (.numTotalTests // 1) * 100) end)
}' "$tmp_json"
# Expected: { "passed": 21, "total": 22, "rate": 95.45 }

Example 3: Eval Report

## Healthcare Eval: 2026-03-27 [commit abc1234]

### Patient Safety: PASS

| Category | Tests | Pass | Fail | Status |
|----------|-------|------|------|--------|
| CDSS Accuracy | 39 | 39 | 0 | PASS |
| PHI Exposure | 8 | 8 | 0 | PASS |
| Data Integrity | 12 | 12 | 0 | PASS |
| Clinical Workflow | 22 | 21 | 1 | 95.5% PASS |
| Integration | 6 | 6 | 0 | PASS |

### Coverage: 84% (target: 80%+)
### Verdict: SAFE TO DEPLOY
Files1
1 files · 1.0 KB

Select a file to preview

Overall Score

88/100

Grade

A

Excellent

Safety

92

Quality

87

Clarity

89

Completeness

83

Summary

A healthcare-specific CI/CD testing harness that enforces automated patient safety gates before deployments. The skill defines five test categories (CDSS Accuracy, PHI Exposure, Data Integrity, Clinical Workflow, Integration Compliance) with tiered pass thresholds: three CRITICAL gates requiring 100% pass rate that block deployment on failure, and two HIGH gates requiring 95%+ pass rate that warn but allow review-gated deployment. It provides CLI commands, GitHub Actions workflow templates, and reporting examples to verify clinical decision support systems, data privacy, and integration compliance.

Detected Capabilities

test execution (Jest, CLI)CI/CD workflow integration (GitHub Actions)JSON output parsing (jq)coverage reporting and thresholdspass rate calculation (bash arithmetic)deployment blocking logic

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

verify patient safetyhealthcare deployment gatecdss accuracy testphi exposure checkclinical data integrityhl7 fhir validationmedical app safety

Risk Signals

INFO

No credential access or data exfiltration patterns detected

entire skill
INFO

Bash arithmetic operations (bc, echo scale=2) used for test rate calculation — acceptable for CI context

Clinical Workflow and Integration Compliance sections
INFO

mktemp used for temporary JSON output files — appropriate for CI/CD pipelines

Example commands and GitHub Actions workflow

Use Cases

  • Verify CDSS logic before EMR deployment
  • Test for PHI leaks in error responses and APIs
  • Validate clinical data integrity and audit trails
  • Confirm HL7/FHIR integration compliance
  • Gate production deployments on safety test results
  • Report safety verification status to stakeholders

Quality Notes

  • Excellent domain specificity — all test categories directly address clinical safety concerns (CDSS accuracy, PHI exposure, data integrity)
  • Clear pass/fail matrix with explicit thresholds and deployment consequences documented for each category
  • Framework-agnostic design with explicit note that Jest is reference but adapts to Vitest, pytest, PHPUnit
  • Well-structured CI/CD integration example (GitHub Actions) with concrete workflow that can be directly adopted
  • Anti-patterns section proactively identifies risky deviations (skipping tests, reducing thresholds, mocking CDSS)
  • Examples include realistic output formats and expected test counts that help agents understand success criteria
  • Safety-first tone reinforced throughout ('Patient safety is non-negotiable', 'single CRITICAL failure blocks deployment')
  • Skill properly scoped to test orchestration and validation, does not attempt to write tests themselves or modify application code
Model: claude-haiku-4-5-20251001Analyzed: May 11, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest
v1.0

No changelog

2026-04-12

Add affaan-m/healthcare-eval-harness to your library

Command Palette

Search for a command to run...

affaan-m/healthcare-eval-harness | SkillRepo