AI Regression Testing

Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.

When to Activate

AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
A bug was found and fixed — need to prevent re-introduction
Project has a sandbox/mock mode that can be leveraged for DB-free testing
Running /bug-check or similar review commands after code changes
Multiple code paths exist (sandbox vs production, feature flags, etc.)

The Core Problem

When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:

AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists

Real-world example (observed in production):

Fix 1: Added notification_settings to API response
  → Forgot to add it to the SELECT query
  → AI reviewed and missed it (same blind spot)

Fix 2: Added it to SELECT query
  → TypeScript build error (column not in generated types)
  → AI reviewed Fix 1 but didn't catch the SELECT issue

Fix 3: Changed to SELECT *
  → Fixed production path, forgot sandbox path
  → AI reviewed and missed it AGAIN (4th occurrence)

Fix 4: Test caught it instantly on first run PASS:

The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.

Sandbox-Mode API Testing

Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.

Setup (Vitest + Next.js App Router)

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});

// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

Test Helper for Next.js API Routes

// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

Writing Regression Tests

The key principle: write tests for bugs that were found, not for code that works.

// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← Added after bug found it missing
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // Regression test — this exact bug was introduced by AI 4 times
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

Testing Sandbox/Production Parity

The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).

// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // This caught a bug where partner_name was added
    // to production path but not sandbox path
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

Integrating Tests into Bug-Check Workflow

Custom Command Definition

<!-- .claude/commands/bug-check.md -->
# Bug Check

## Step 1: Automated Tests (mandatory, cannot skip)

Run these commands FIRST before any code review:

    npm run test       # Vitest test suite
    npm run build      # TypeScript type check + build

- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass

## Step 2: Code Review (AI review)

1. Sandbox / production path consistency
2. API response shape matches frontend expectations
3. SELECT clause completeness
4. Error handling with rollback
5. Optimistic update race conditions

## Step 3: For each bug fixed, propose a regression test

The Workflow

User: "バグチェックして" (or "/bug-check")
  │
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  │
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  │
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  │
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks

Common AI Regression Patterns

Pattern 1: Sandbox/Production Path Mismatch

Frequency: Most common (observed in 3 out of 4 regressions)

// FAIL: AI adds field to production path only
if (isSandboxMode()) {
  return { data: { id, email, name } };  // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };

// PASS: Both paths must return the same shape
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };

Test to catch it:

it("sandbox and production return same fields", async () => {
  // In test env, sandbox mode is forced ON
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

Pattern 2: SELECT Clause Omission

Frequency: Common with Supabase/Prisma when adding new columns

// FAIL: New column added to response but not to SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings not here
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined

// PASS: Use SELECT * or explicitly include new columns
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

Pattern 3: Error State Leakage

Frequency: Moderate — when adding error handling to existing components

// FAIL: Error state set but old data not cleared
catch (err) {
  setError("Failed to load");
  // reservations still shows data from previous tab!
}

// PASS: Clear related state on error
catch (err) {
  setReservations([]);  // Clear stale data
  setError("Failed to load");
}

Pattern 4: Optimistic Update Without Proper Rollback

// FAIL: No rollback on failure
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // If API fails, item is gone from UI but still in DB
};

// PASS: Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // Rollback
    alert("削除に失敗しました");
  }
};

Strategy: Test Where Bugs Were Found

Don't aim for 100% coverage. Instead:

Bug found in /api/user/profile     → Write test for profile API
Bug found in /api/user/messages    → Write test for messages API
Bug found in /api/user/favorites   → Write test for favorites API
No bug in /api/user/notifications  → Don't write test (yet)

Why this works with AI development:

AI tends to make the same category of mistake repeatedly
Bugs cluster in complex areas (auth, multi-path logic, state management)
Once tested, that exact regression cannot happen again
Test count grows organically with bug fixes — no wasted effort

Quick Reference

AI Regression Pattern	Test Strategy	Priority
Sandbox/production mismatch	Assert same response shape in sandbox mode	High
SELECT clause omission	Assert all required fields in response	High
Error state leakage	Assert state cleanup on error	Medium
Missing rollback	Assert state restored on API failure	Medium
Type cast masking null	Assert field is not undefined	Medium

DO / DON'T

DO:

Write tests immediately after finding a bug (before fixing it if possible)
Test the API response shape, not the implementation
Run tests as the first step of every bug-check
Keep tests fast (< 1 second total with sandbox mode)
Name tests after the bug they prevent (e.g., "BUG-R1 regression")

DON'T:

Write tests for code that has never had a bug
Trust AI self-review as a substitute for automated tests
Skip sandbox path testing because "it's just mock data"
Write integration tests when unit tests suffice
Aim for coverage percentage — aim for regression prevention

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

87/100

Grade

A

Excellent

Safety

90

Quality

88

Clarity

86

Completeness

82

Summary

This skill teaches regression testing strategies specifically designed for AI-assisted development, where the same model writes and reviews code—creating systematic blind spots. It provides sandbox-mode API testing patterns, automated bug-check workflows, and guidance on catching common AI regressions like sandbox/production path mismatches, SELECT clause omissions, and state management errors.

Static Analysis Findings

1 finding

Patterns detected by deterministic static analysis before AI scoring. Hover over any finding code for detailed information and remediation guidance.

Credential Exposure

SEC-020Direct .env File Access3x in 1 file

Direct .env file access

SKILL.md.env3x

85% confidenceCWE-538: Sensitive Info in Externally-Accessible File

Detected Capabilities

Vitest configuration for sandbox-mode testingNext.js API route testing helpers and patternsTest design for detecting sandbox/production parity issuesWorkflow integration for automated bug-check commandsCode analysis of common AI blind spots and regression patternsAPI response shape validation and contract testingState management error detection and rollback testing

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

ai regression testingsandbox mode api testingbug-check workflowprevent ai blind spotssandbox production parityregression test suiteapi response validation

Risk Signals

INFO

Direct .env file access (SEC-020) — references to .env and environment variable setup

SKILL.md | Multiple locations: setup.ts code examples, test configuration descriptions

Use Cases

Catch bugs introduced by AI code generation before production
Prevent recurring regressions in sandbox vs. production code paths
Implement automated bug-check workflows triggered after AI code changes
Test API contracts and response shapes without database dependencies
Develop regression test suites for specific bugs found in AI-generated code

Quality Notes

Excellent real-world grounding: includes observed production bugs with specific examples (notification_settings omission repeated 4 times)
Well-structured problem definition: clearly explains the AI blind spot phenomenon and why it creates predictable failure patterns
Comprehensive code examples: includes working TypeScript/Vitest setup, helper functions, and test patterns that an agent can directly apply
Clear test design philosophy: teaches 'write tests for bugs that were found, not for code that works'—practical and efficient approach
Strong workflow integration: shows how to bind tests to `/bug-check` command and define clear priority ordering (tests → build → code review)
Excellent pattern documentation: 4 common AI regression patterns with before/after code examples showing exact failures and fixes
Pragmatic scope: focuses on high-frequency regressions (sandbox/production mismatch, SELECT clause issues) rather than trying to cover all possible bugs
Good format hierarchy: uses tables, code blocks, and markdown appropriately for readability
Minor: Could benefit from explicit error handling guidance (what to do if Vitest setup fails, how to debug test timeouts)

Model: claude-haiku-4-5-20251001Analyzed: Apr 20, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest

v1.0

No changelog

2026-04-12

ai-regression-testing