Codebase Onboarding

Systematically analyze an unfamiliar codebase and produce a structured onboarding guide. Designed for developers joining a new project or setting up Claude Code in an existing repo for the first time.

When to Use

First time opening a project with Claude Code
Joining a new team or repository
User asks "help me understand this codebase"
User asks to generate a CLAUDE.md for a project
User says "onboard me" or "walk me through this repo"

How It Works

Phase 1: Reconnaissance

Gather raw signals about the project without reading every file. Run these checks in parallel:

1. Package manifest detection
   → package.json, go.mod, Cargo.toml, pyproject.toml, pom.xml, build.gradle,
     Gemfile, composer.json, mix.exs, pubspec.yaml

2. Framework fingerprinting
   → next.config.*, nuxt.config.*, angular.json, vite.config.*,
     django settings, flask app factory, fastapi main, rails config

3. Entry point identification
   → main.*, index.*, app.*, server.*, cmd/, src/main/

4. Directory structure snapshot
   → Top 2 levels of the directory tree, ignoring node_modules, vendor,
     .git, dist, build, __pycache__, .next

5. Config and tooling detection
   → .eslintrc*, .prettierrc*, tsconfig.json, Makefile, Dockerfile,
     docker-compose*, .github/workflows/, .env.example, CI configs

6. Test structure detection
   → tests/, test/, __tests__/, *_test.go, *.spec.ts, *.test.js,
     pytest.ini, jest.config.*, vitest.config.*

Phase 2: Architecture Mapping

From the reconnaissance data, identify:

Tech Stack

Language(s) and version constraints
Framework(s) and major libraries
Database(s) and ORMs
Build tools and bundlers
CI/CD platform

Architecture Pattern

Monolith, monorepo, microservices, or serverless
Frontend/backend split or full-stack
API style: REST, GraphQL, gRPC, tRPC

Key Directories Map the top-level directories to their purpose:

src/components/  → React UI components
src/api/         → API route handlers
src/lib/         → Shared utilities
src/db/          → Database models and migrations
tests/           → Test suites
scripts/         → Build and deployment scripts

Data Flow Trace one request from entry to response:

Where does a request enter? (router, handler, controller)
How is it validated? (middleware, schemas, guards)
Where is business logic? (services, models, use cases)
How does it reach the database? (ORM, raw queries, repositories)

Phase 3: Convention Detection

Identify patterns the codebase already follows:

Naming Conventions

File naming: kebab-case, camelCase, PascalCase, snake_case
Component/class naming patterns
Test file naming: *.test.ts, *.spec.ts, *_test.go

Code Patterns

Error handling style: try/catch, Result types, error codes
Dependency injection or direct imports
State management approach
Async patterns: callbacks, promises, async/await, channels

Git Conventions

Branch naming from recent branches
Commit message style from recent commits
PR workflow (squash, merge, rebase)
If the repo has no commits yet or only a shallow history (e.g. git clone --depth 1), skip this section and note "Git history unavailable or too shallow to detect conventions"

Phase 4: Generate Onboarding Artifacts

Produce two outputs:

Output 1: Onboarding Guide

# Onboarding Guide: [Project Name]

## Overview
[2-3 sentences: what this project does and who it serves]

## Tech Stack
<!-- Example for a Next.js project — replace with detected stack -->
| Layer | Technology | Version |
|-------|-----------|---------|
| Language | TypeScript | 5.x |
| Framework | Next.js | 14.x |
| Database | PostgreSQL | 16 |
| ORM | Prisma | 5.x |
| Testing | Jest + Playwright | - |

## Architecture
[Diagram or description of how components connect]

## Key Entry Points
<!-- Example for a Next.js project — replace with detected paths -->
- **API routes**: `src/app/api/` — Next.js route handlers
- **UI pages**: `src/app/(dashboard)/` — authenticated pages
- **Database**: `prisma/schema.prisma` — data model source of truth
- **Config**: `next.config.ts` — build and runtime config

## Directory Map
[Top-level directory → purpose mapping]

## Request Lifecycle
[Trace one API request from entry to response]

## Conventions
- [File naming pattern]
- [Error handling approach]
- [Testing patterns]
- [Git workflow]

## Common Tasks
<!-- Example for a Node.js project — replace with detected commands -->
- **Run dev server**: `npm run dev`
- **Run tests**: `npm test`
- **Run linter**: `npm run lint`
- **Database migrations**: `npx prisma migrate dev`
- **Build for production**: `npm run build`

## Where to Look
<!-- Example for a Next.js project — replace with detected paths -->
| I want to... | Look at... |
|--------------|-----------|
| Add an API endpoint | `src/app/api/` |
| Add a UI page | `src/app/(dashboard)/` |
| Add a database table | `prisma/schema.prisma` |
| Add a test | `tests/` matching the source path |
| Change build config | `next.config.ts` |

Output 2: Starter CLAUDE.md

Generate or update a project-specific CLAUDE.md based on detected conventions. If CLAUDE.md already exists, read it first and enhance it — preserve existing project-specific instructions and clearly call out what was added or changed.

# Project Instructions

## Tech Stack
[Detected stack summary]

## Code Style
- [Detected naming conventions]
- [Detected patterns to follow]

## Testing
- Run tests: `[detected test command]`
- Test pattern: [detected test file convention]
- Coverage: [if configured, the coverage command]

## Build & Run
- Dev: `[detected dev command]`
- Build: `[detected build command]`
- Lint: `[detected lint command]`

## Project Structure
[Key directory → purpose map]

## Conventions
- [Commit style if detectable]
- [PR workflow if detectable]
- [Error handling patterns]

Best Practices

Don't read everything — reconnaissance should use Glob and Grep, not Read on every file. Read selectively only for ambiguous signals.
Verify, don't guess — if a framework is detected from config but the actual code uses something different, trust the code.
Respect existing CLAUDE.md — if one already exists, enhance it rather than replacing it. Call out what's new vs existing.
Stay concise — the onboarding guide should be scannable in 2 minutes. Details belong in the code, not the guide.
Flag unknowns — if a convention can't be confidently detected, say so rather than guessing. "Could not determine test runner" is better than a wrong answer.

Anti-Patterns to Avoid

Generating a CLAUDE.md that's longer than 100 lines — keep it focused
Listing every dependency — highlight only the ones that shape how you write code
Describing obvious directory names — src/ doesn't need an explanation
Copying the README — the onboarding guide adds structural insight the README lacks

Examples

Example 1: First time in a new repo

User: "Onboard me to this codebase" Action: Run full 4-phase workflow → produce Onboarding Guide + Starter CLAUDE.md Output: Onboarding Guide printed directly to the conversation, plus a CLAUDE.md written to the project root

Example 2: Generate CLAUDE.md for existing project

User: "Generate a CLAUDE.md for this project" Action: Run Phases 1-3, skip Onboarding Guide, produce only CLAUDE.md Output: Project-specific CLAUDE.md with detected conventions

Example 3: Enhance existing CLAUDE.md

User: "Update the CLAUDE.md with current project conventions" Action: Read existing CLAUDE.md, run Phases 1-3, merge new findings Output: Updated CLAUDE.md with additions clearly marked

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

88/100

Grade

A

Excellent

Safety

92

Quality

87

Clarity

88

Completeness

82

Summary

Analyze an unfamiliar codebase and generate structured onboarding artifacts including architecture maps, entry points, conventions, and a starter CLAUDE.md file. The skill performs read-only reconnaissance across package manifests, framework configs, directory structures, and git history to understand a project's tech stack and patterns, then synthesizes findings into a scannable onboarding guide and project instructions.

Static Analysis Findings

1 finding

Patterns detected by deterministic static analysis before AI scoring. Hover over any finding code for detailed information and remediation guidance.

Credential Exposure

SEC-020Direct .env File Access

Direct .env file access

SKILL.md.env

85% confidenceCWE-538: Sensitive Info in Externally-Accessible File

Detected Capabilities

Package manifest detection (package.json, go.mod, Cargo.toml, etc.)Framework fingerprinting (Next.js, Django, FastAPI, Rails, etc.)Directory tree mapping and structure analysisConfig and tooling detection (ESLint, TypeScript, Docker, CI/CD)Test structure identificationGit convention analysis (branch naming, commit style, PR workflow)Architecture pattern classification (monolith, monorepo, microservices)Data flow tracing (request lifecycle)File generation (Onboarding Guide markdown, CLAUDE.md)Existing file preservation and enhancement

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

onboard to codebaseunderstand project structuregenerate CLAUDE.mdfirst time in repomap architecturedetect conventionsproject setup

Risk Signals

INFO

SEC-020: Direct .env file access mentioned in reconnaissance phase (config detection step lists '.env.example')

SKILL.md | Phase 1, Config and tooling detection section

Use Cases

First-time setup of Claude Code in an existing repository
Onboarding a new developer to a project
Understanding an unfamiliar codebase structure and conventions
Generating or updating project-specific CLAUDE.md instructions
Documenting tech stack, architecture patterns, and key entry points

Quality Notes

Excellent structure with clear four-phase workflow (Reconnaissance, Architecture Mapping, Convention Detection, Artifact Generation) that is easy to follow and parallelize.
Strong emphasis on avoiding anti-patterns: explicitly discourages over-reading files, guessing at conventions, and generating bloated documentation.
Well-documented best practices section provides actionable guidance on verification, respecting existing files, and staying concise.
Three concrete examples (new repo, generate CLAUDE.md, enhance existing CLAUDE.md) cover common user intents clearly.
Excellent use of templated outputs — provides markdown structure with field placeholders, enabling agents to populate detected values systematically.
Phase 1 reconnaissance is well-designed to use efficient glob/grep patterns rather than exhaustive file reading, reducing scope and computational cost.
Good defensive notes on shallow git history and missing test structures — skill acknowledges limits rather than guessing.
Output format includes clear tables (tech stack, directory map, common tasks, where to look) that make onboarding guides scannable and actionable.
Skill correctly handles the case of existing CLAUDE.md — preserves existing content and calls out additions, avoiding destructive overwrites.
Clear boundaries on output scope — specifies max 100 lines for generated CLAUDE.md, enforcing focus.
Minor: Convention detection relies on reading git history (recent commits, branches), which could be unavailable in shallow clones — skill handles this gracefully.

Model: claude-haiku-4-5-20251001Analyzed: Apr 20, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest

v1.0

No changelog

2026-04-12

codebase-onboarding