Healthcare Eval Harness — Patient Safety Verification

Automated verification system for healthcare application deployments. A single CRITICAL failure blocks deployment. Patient safety is non-negotiable.

Note: Examples use Jest as the reference test runner. Adapt commands for your framework (Vitest, pytest, PHPUnit, etc.) — the test categories and pass thresholds are framework-agnostic.

When to Use

Before any deployment of EMR/EHR applications
After modifying CDSS logic (drug interactions, dose validation, scoring)
After changing database schemas that touch patient data
After modifying authentication or access control
During CI/CD pipeline configuration for healthcare apps
After resolving merge conflicts in clinical modules

How It Works

The eval harness runs five test categories in order. The first three (CDSS Accuracy, PHI Exposure, Data Integrity) are CRITICAL gates requiring 100% pass rate — a single failure blocks deployment. The remaining two (Clinical Workflow, Integration) are HIGH gates requiring 95%+ pass rate.

Each category maps to a Jest test path pattern. The CI pipeline runs CRITICAL gates with --bail (stop on first failure) and enforces coverage thresholds with --coverage --coverageThreshold.

Eval Categories

1. CDSS Accuracy (CRITICAL — 100% required)

Tests all clinical decision support logic: drug interaction pairs (both directions), dose validation rules, clinical scoring vs published specs, no false negatives, no silent failures.

npx jest --testPathPattern='tests/cdss' --bail --ci --coverage

2. PHI Exposure (CRITICAL — 100% required)

Tests for protected health information leaks: API error responses, console output, URL parameters, browser storage, cross-facility isolation, unauthenticated access, service role key absence.

npx jest --testPathPattern='tests/security/phi' --bail --ci

3. Data Integrity (CRITICAL — 100% required)

Tests clinical data safety: locked encounters, audit trail entries, cascade delete protection, concurrent edit handling, no orphaned records.

npx jest --testPathPattern='tests/data-integrity' --bail --ci

4. Clinical Workflow (HIGH — 95%+ required)

Tests end-to-end flows: encounter lifecycle, template rendering, medication sets, drug/diagnosis search, prescription PDF, red flag alerts.

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No clinical tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Clinical pass rate: ${rate}% ($passed/$total)"

5. Integration Compliance (HIGH — 95%+ required)

Tests external systems: HL7 message parsing (v2.x), FHIR validation, lab result mapping, malformed message handling.

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$tmp_json" || true
total=$(jq '.numTotalTests // 0' "$tmp_json")
passed=$(jq '.numPassedTests // 0' "$tmp_json")
if [ "$total" -eq 0 ]; then
  echo "No integration tests found" >&2
  exit 1
fi
rate=$(echo "scale=2; $passed * 100 / $total" | bc)
echo "Integration pass rate: ${rate}% ($passed/$total)"

Pass/Fail Matrix

Category	Threshold	On Failure
CDSS Accuracy	100%	BLOCK deployment
PHI Exposure	100%	BLOCK deployment
Data Integrity	100%	BLOCK deployment
Clinical Workflow	95%+	WARN, allow with review
Integration	95%+	WARN, allow with review

CI/CD Integration

name: Healthcare Safety Gate
on: [push, pull_request]

jobs:
  safety-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci

      # CRITICAL gates — 100% required, bail on first failure
      - name: CDSS Accuracy
        run: npx jest --testPathPattern='tests/cdss' --bail --ci --coverage --coverageThreshold='{"global":{"branches":80,"functions":80,"lines":80}}'

      - name: PHI Exposure Check
        run: npx jest --testPathPattern='tests/security/phi' --bail --ci

      - name: Data Integrity
        run: npx jest --testPathPattern='tests/data-integrity' --bail --ci

      # HIGH gates — 95%+ required, custom threshold check
      # HIGH gates — 95%+ required
      - name: Clinical Workflows
        run: |
          TMP_JSON=$(mktemp)
          npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$TMP_JSON" || true
          TOTAL=$(jq '.numTotalTests // 0' "$TMP_JSON")
          PASSED=$(jq '.numPassedTests // 0' "$TMP_JSON")
          if [ "$TOTAL" -eq 0 ]; then
            echo "::error::No clinical tests found"; exit 1
          fi
          RATE=$(echo "scale=2; $PASSED * 100 / $TOTAL" | bc)
          echo "Pass rate: ${RATE}% ($PASSED/$TOTAL)"
          if (( $(echo "$RATE < 95" | bc -l) )); then
            echo "::warning::Clinical pass rate ${RATE}% below 95%"
          fi

      - name: Integration Compliance
        run: |
          TMP_JSON=$(mktemp)
          npx jest --testPathPattern='tests/integration' --ci --json --outputFile="$TMP_JSON" || true
          TOTAL=$(jq '.numTotalTests // 0' "$TMP_JSON")
          PASSED=$(jq '.numPassedTests // 0' "$TMP_JSON")
          if [ "$TOTAL" -eq 0 ]; then
            echo "::error::No integration tests found"; exit 1
          fi
          RATE=$(echo "scale=2; $PASSED * 100 / $TOTAL" | bc)
          echo "Pass rate: ${RATE}% ($PASSED/$TOTAL)"
          if (( $(echo "$RATE < 95" | bc -l) )); then
            echo "::warning::Integration pass rate ${RATE}% below 95%"
          fi

Anti-Patterns

Skipping CDSS tests "because they passed last time"
Setting CRITICAL thresholds below 100%
Using --no-bail on CRITICAL test suites
Mocking the CDSS engine in integration tests (must test real logic)
Allowing deployments when safety gate is red
Running tests without --coverage on CDSS suites

Examples

Example 1: Run All Critical Gates Locally

npx jest --testPathPattern='tests/cdss' --bail --ci --coverage && \
npx jest --testPathPattern='tests/security/phi' --bail --ci && \
npx jest --testPathPattern='tests/data-integrity' --bail --ci

Example 2: Check HIGH Gate Pass Rate

tmp_json=$(mktemp)
npx jest --testPathPattern='tests/clinical' --ci --json --outputFile="$tmp_json" || true
jq '{
  passed: (.numPassedTests // 0),
  total: (.numTotalTests // 0),
  rate: (if (.numTotalTests // 0) == 0 then 0 else ((.numPassedTests // 0) / (.numTotalTests // 1) * 100) end)
}' "$tmp_json"
# Expected: { "passed": 21, "total": 22, "rate": 95.45 }

Example 3: Eval Report

## Healthcare Eval: 2026-03-27 [commit abc1234]

### Patient Safety: PASS

| Category | Tests | Pass | Fail | Status |
|----------|-------|------|------|--------|
| CDSS Accuracy | 39 | 39 | 0 | PASS |
| PHI Exposure | 8 | 8 | 0 | PASS |
| Data Integrity | 12 | 12 | 0 | PASS |
| Clinical Workflow | 22 | 21 | 1 | 95.5% PASS |
| Integration | 6 | 6 | 0 | PASS |

### Coverage: 84% (target: 80%+)
### Verdict: SAFE TO DEPLOY

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

89/100

Grade

A

Excellent

Safety

88

Quality

90

Clarity

89

Completeness

88

Summary

A structured patient safety evaluation harness for healthcare application deployments. It orchestrates five test categories (CDSS accuracy, PHI exposure, data integrity, clinical workflow, integration compliance) with strict pass thresholds — CRITICAL gates require 100% pass rate and block deployment on failure, while HIGH gates require 95%+ and issue warnings. The skill provides bash commands, Jest configuration, GitHub Actions CI/CD templates, and a clear pass/fail matrix to ensure clinical safety before production.

Detected Capabilities

test execution via npm/jestbash scripting and conditionalstemporary file creationjson parsing and arithmeticci/cd pipeline configuration (github actions)coverage threshold enforcementpass rate calculation from test output

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

healthcare deployment safetyemr safety gatepatient safety testingcdss accuracy verificationphi exposure checkclinical data integrityhealthcare ci/cd pipelinehealthcare test automation

Risk Signals

INFO

Temporary file creation using mktemp for json output — standard safe practice for test output capture

Test categories 4 and 5 (Clinical Workflow, Integration Compliance)

INFO

Bash arithmetic on test metrics — used for pass rate calculation and threshold comparison

Clinical Workflow and Integration Compliance test sections

INFO

npm/jest execution in CI/CD context with coverage thresholds — expected behavior for healthcare validation

CI/CD Integration section and test categories

Use Cases

{"use_case": "Block unsafe EMR/EHR deployments by enforcing 100% pass rate on CDSS accuracy, PHI exposure, and data integrity tests before production release"}
Catch drug interaction logic errors, dose validation bugs, and scoring mismatches against clinical specifications before patient harm occurs
Prevent protected health information leaks through API responses, console output, URL parameters, and unauthenticated access during deployment validation
Validate clinical data safety by testing locked encounters, audit trail completeness, cascade delete protection, and concurrent edit handling
Enforce end-to-end clinical workflow correctness including encounter lifecycle, medication sets, prescription PDF generation, and red flag alerts
Ensure external integration compliance with HL7 v2.x message parsing, FHIR validation, and lab result mapping before connecting to external systems
Configure healthcare application CI/CD pipelines with automated safety gates that prevent deployment when critical patient safety tests fail

Quality Notes

Excellent scope clarity: skill is narrowly focused on healthcare deployment safety with no cross-domain functionality
Strong error handling: explicit checks for empty test suites (total == 0) prevent division by zero and provide clear failure messages
Well-documented pass/fail thresholds: clear distinction between CRITICAL (100%, block) and HIGH (95%+, warn) gates with explicit consequences
Comprehensive examples: three practical examples demonstrate local execution, pass rate checking, and final eval report format
Clear anti-patterns section: explicitly lists five practices to avoid, helping users understand safety boundaries
Framework-agnostic design: skill uses Jest as reference but explicitly instructs users to adapt for Vitest, pytest, PHPUnit
Real-world medical context: references concrete healthcare standards (HL7 v2.x, FHIR) and clinical concepts (drug interactions, dose validation, audit trails)
GitHub Actions template is production-ready with proper steps for setup and error annotation (::error::, ::warning::)
Pass rate calculation includes proper floating-point arithmetic with scale=2 precision
Comprehensive test categories map to actual healthcare risk domains

Model: claude-haiku-4-5-20251001Analyzed: Jul 14, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.2

Content updated

2026-07-14

Latest

v1.1

Content updated

2026-04-20

v1.0

No changelog

2026-04-12

healthcare-eval-harness

Healthcare Eval Harness — Patient Safety Verification

When to Use

How It Works

Eval Categories

Pass/Fail Matrix

CI/CD Integration

Anti-Patterns

Examples

Example 1: Run All Critical Gates Locally

Example 2: Check HIGH Gate Pass Rate

Example 3: Eval Report

Summary

Detected Capabilities

Trigger Keywords

Risk Signals

Use Cases

Quality Notes

Reviews

Version History

Command Palette