Catalog
obra/finding-duplicate-functions

obra

finding-duplicate-functions

Use when auditing a codebase for semantic duplication - functions that do the same thing but have different names or implementations. Especially useful for LLM-generated codebases where new functions are often created rather than reusing existing ones.

global
0installs0uses~1.2k
v1.0Saved May 2, 2026

Finding Duplicate-Intent Functions

Overview

LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."

This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.

When to Use

  • Codebase has grown organically with multiple contributors (human or LLM)
  • You suspect utility functions have been reimplemented multiple times
  • Before major refactoring to identify consolidation opportunities
  • After jscpd has been run and syntactic duplicates are already handled

Quick Reference

Phase Tool Model Output
1. Extract scripts/extract-functions.sh - catalog.json
2. Categorize scripts/categorize-prompt.md haiku categorized.json
3. Split scripts/prepare-category-analysis.sh - categories/*.json
4. Detect scripts/find-duplicates-prompt.md opus duplicates/*.json
5. Report scripts/generate-report.sh - report.md

Process

digraph duplicate_detection {
  rankdir=TB;
  node [shape=box];

  extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
  categorize [label="2. Categorize by domain\n(haiku subagent)"];
  split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
  detect [label="4. Find duplicates per category\n(opus subagent per category)"];
  report [label="5. Generate report\n./scripts/generate-report.sh"];
  review [label="6. Human review & consolidate"];

  extract -> categorize -> split -> detect -> report -> review;
}

Phase 1: Extract Function Catalog

./scripts/extract-functions.sh src/ -o catalog.json

Options:

  • -o FILE: Output file (default: stdout)
  • -c N: Lines of context to capture (default: 15)
  • -t GLOB: File types (default: *.ts,*.tsx,*.js,*.jsx)
  • --include-tests: Include test files (excluded by default)

Test files (*.test.*, *.spec.*, __tests__/**) are excluded by default since test utilities are less likely to be consolidation candidates.

Phase 2: Categorize by Domain

Dispatch a haiku subagent using the prompt in scripts/categorize-prompt.md.

Insert the contents of catalog.json where indicated in the prompt template. Save output as categorized.json.

Phase 3: Split into Categories

./scripts/prepare-category-analysis.sh categorized.json ./categories

Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.

Phase 4: Find Duplicates (Per Category)

For each category file in ./categories/, dispatch an opus subagent using the prompt in scripts/find-duplicates-prompt.md.

Save each output as ./duplicates/{category}.json.

Phase 5: Generate Report

./scripts/generate-report.sh ./duplicates ./duplicates-report.md

Produces a prioritized markdown report grouped by confidence level.

Phase 6: Human Review

Review the report. For HIGH confidence duplicates:

  1. Verify the recommended survivor has tests
  2. Update callers to use the survivor
  3. Delete the duplicates
  4. Run tests

High-Risk Duplicate Zones

Focus extraction on these areas first - they accumulate duplicates fastest:

Zone Common Duplicates
utils/, helpers/, lib/ General utilities reimplemented
Validation code Same checks written multiple ways
Error formatting Error-to-string conversions
Path manipulation Joining, resolving, normalizing paths
String formatting Case conversion, truncation, escaping
Date formatting Same formats implemented repeatedly
API response shaping Similar transformations for different endpoints

Common Mistakes

Extracting too much: Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.

Skipping the categorization step: Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.

Using haiku for duplicate detection: Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.

Consolidating without tests: Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.

Files6
6 files · 21.1 KB

Select a file to preview

Overall Score

86/100

Grade

A

Excellent

Safety

88

Quality

88

Clarity

87

Completeness

82

Summary

This skill guides AI agents through a systematic 5-phase process to detect semantic duplicate functions in a codebase—functions that serve the same purpose but have different names or implementations. It uses shell scripts for extraction and splitting, plus LLM-powered subagents (haiku for categorization, opus for duplicate detection) to identify consolidation opportunities, which is especially valuable in LLM-generated codebases where reimplementation is common.

Detected Capabilities

Shell script execution for function extraction and file splittingJSON parsing and transformation using jqLLM subagent dispatch and prompt templating (haiku and opus models)Markdown report generation from structured duplicate analysisBash glob patterns and ripgrep-based code scanningMulti-phase pipeline orchestration with intermediate state files

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

detect duplicate functionssemantic code duplicationconsolidate utilitiescodebase refactoring auditremove reimplemented functions

Risk Signals

INFO

Bash script uses ripgrep for code pattern matching across project

scripts/extract-functions.sh, lines ~80-95
INFO

Scripts create and read intermediate JSON files in current working directory

scripts/extract-functions.sh, scripts/prepare-category-analysis.sh, scripts/generate-report.sh
INFO

generate-report.sh uses jq to process JSON and write markdown output

scripts/generate-report.sh, lines ~40-90
INFO

Skill requires two external LLM model calls (haiku and opus subagents)

SKILL.md, Phase 2 and Phase 4 sections
INFO

extract-functions.sh invokes ripgrep with multiple glob patterns to scan source tree

scripts/extract-functions.sh, lines ~58-67

Use Cases

  • Audit LLM-generated codebases for semantic duplication before refactoring
  • Identify utility function consolidation opportunities across a project
  • Prepare codebase cleanup after syntactic duplicate detection has been completed
  • Reduce maintenance burden by finding and merging intent-equivalent functions
  • Validate that functions with different names don't implement the same logic

Quality Notes

  • Excellent documentation with clear phase diagram and quick reference table
  • Well-structured process broken into discrete, testable phases with specific outputs
  • Good error handling in shell scripts (set -euo pipefail, input validation, helpful error messages)
  • Clear guidance on when to use each model (haiku for categorization efficiency, opus for accuracy)
  • Comprehensive 'Common Mistakes' section helps users avoid pitfalls (e.g., skipping categorization, using wrong model)
  • High-risk zones table provides practical guidance on where duplicates accumulate
  • Prompt templates are detailed and include explicit output format specifications
  • Scripts include usage documentation and example invocations
  • Test files are correctly excluded by default from extraction phase
  • Output confidence levels (HIGH/MEDIUM/LOW) are clearly defined with examples
  • Recommendation system (CONSOLIDATE/INVESTIGATE/KEEP_SEPARATE) is well-motivated
  • Process includes human review step (Phase 6) rather than fully automated consolidation
  • Missing: explicit guidance on handling large codebases (performance implications of ripgrep)
  • Missing: error recovery if a phase fails (e.g., what to do if Opus output is malformed)
Model: claude-haiku-4-5-20251001Analyzed: May 2, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Add obra/finding-duplicate-functions to your library

Command Palette

Search for a command to run...