Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
Save outputs under output/transcribe/ when working in this repo.

Decision rules

Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
If audio is longer than ~30 seconds, keep --chunking-strategy auto.
Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Use output/transcribe/<job-id>/ for evaluation runs.
Use --out-dir for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer uv for dependency management.

uv pip install openai

If uv is unavailable:

python3 -m pip install openai

Environment

OPENAI_API_KEY must be set for live API calls.
If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
Never ask the user to paste the full key in chat.

Skill path (set once)

export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"

User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).

CLI quick start

Single file (fast text default):

python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt

Diarization with known speakers (up to 4):

python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting

Plain text output (explicit):

python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

references/api.md: supported formats, limits, response formats, and known-speaker notes.

Files6

6 files · 21.8 KB

Select a file to preview

Overall Score

88/100

Grade

A

Excellent

Safety

92

Quality

85

Clarity

88

Completeness

82

Summary

This skill guides agents to transcribe audio files to text using OpenAI's audio models, with optional speaker diarization and known-speaker hints. It provides a bundled Python CLI (`transcribe_diarize.py`) that handles input validation, API calls, and output formatting, with clear decision rules for model selection and response formats.

Detected Capabilities

Read audio files from filesystem (mp3, mp4, m4a, wav, webm, etc.)Call OpenAI audio transcription API with configurable models and response formatsPerform speaker diarization with optional known-speaker reference matchingWrite transcripts to local filesystem in text or JSON formatsValidate file sizes and audio formats before API callsParse command-line arguments for flexible workflow controlEncode audio files as data URLs for known-speaker reference submission

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

transcribe audioconvert speech to textspeaker diarizationinterview transcriptidentify speakersaudio to textmeeting transcription

Risk Signals

INFO

OPENAI_API_KEY environment variable required for API calls

SKILL.md, Environment section; scripts/transcribe_diarize.py _ensure_api_key()

INFO

File write operations to local filesystem (output/transcribe/ and user-specified directories)

scripts/transcribe_diarize.py _build_output_path(), main() output write

INFO

API call to OpenAI with audio file content

scripts/transcribe_diarize.py _run_one()

INFO

Base64 encoding of audio files for data URL generation (known-speaker references)

scripts/transcribe_diarize.py _encode_data_url()

WARNING

Reads audio files from arbitrary filesystem paths provided by user

scripts/transcribe_diarize.py main() audio_paths argument

Referenced Domains

External domains referenced in skill content, detected by static analysis.

www.apache.org

Use Cases

Transcribe speech from recorded interviews or meetings into plain text
Extract and label speakers in multi-person audio recordings using diarization
Convert video audio tracks to searchable transcripts with speaker identification
Process batch audio files with consistent formatting and output structure

Quality Notes

✓ Clear, well-structured workflow with 5 numbered steps guiding agent through the task
✓ Explicit decision rules for model selection based on user intent (fast vs. diarized)
✓ Comprehensive CLI examples covering three common scenarios (fast, diarized, explicit format)
✓ Good error handling in Python CLI: validates API key, file existence, file size, speaker limits, format constraints
✓ Safety guardrails: prompt explicitly unsupported for diarize model; diarized_json requires diarize model; warnings for unsupported feature combinations
✓ Dependencies clearly documented with fallback installation method (uv → pip)
✓ Output conventions specify directory structure to avoid overwrites
✓ Reference map documents API limits and constraints
✓ Python script is well-commented, uses type hints, and follows PEP 8 style
✓ Defensive validation on all user inputs (response format, chunking strategy, known speakers, file paths)
✓ Dry-run mode for testing without API consumption
⚠ SKILL.md does not document all CLI flags (--stdout, --dry-run, --out-dir edge cases not explicitly mentioned)
⚠ No guidance on what to do if transcription quality is poor or contains errors
⚠ No mention of cost implications (OpenAI API billing) for long audio files or batch operations
⚠ `references/api.md` is minimal; could benefit from examples of diarized_json output structure

Model: claude-haiku-4-5-20251001Analyzed: Apr 5, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

transcribe

Audio Transcribe

Workflow

Decision rules

Output conventions

Dependencies (install if missing)

Environment

Skill path (set once)

CLI quick start

Reference map

Summary

Detected Capabilities

Trigger Keywords

Risk Signals

Referenced Domains

Use Cases

Quality Notes

Reviews

Command Palette