Video Editing

AI-assisted editing for real footage. Not generation from prompts. Editing existing video fast.

When to Activate

User wants to edit, cut, or structure video footage
Turning long recordings into short-form content
Building vlogs, tutorials, or demo videos from raw capture
Adding overlays, subtitles, music, or voiceover to existing video
Reframing video for different platforms (YouTube, TikTok, Instagram)
User says "edit video", "cut this footage", "make a vlog", or "video workflow"

Core Thesis

AI video editing is useful when you stop asking it to create the whole video and start using it to compress, structure, and augment real footage. The value is not generation. The value is compression.

The Pipeline

Screen Studio / raw footage
  → Claude / Codex
  → FFmpeg
  → Remotion
  → ElevenLabs / fal.ai
  → Descript or CapCut

Each layer has a specific job. Do not skip layers. Do not try to make one tool do everything.

Layer 1: Capture (Screen Studio / Raw Footage)

Collect the source material:

Screen Studio: polished screen recordings for app demos, coding sessions, browser workflows
Raw camera footage: vlog footage, interviews, event recordings
Desktop capture via VideoDB: session recording with real-time context (see videodb skill)

Output: raw files ready for organization.

Layer 2: Organization (Claude / Codex)

Use Claude Code or Codex to:

Transcribe and label: generate transcript, identify topics and themes
Plan structure: decide what stays, what gets cut, what order works
Identify dead sections: find pauses, tangents, repeated takes
Generate edit decision list: timestamps for cuts, segments to keep
Scaffold FFmpeg and Remotion code: generate the commands and compositions

Example prompt:
"Here's the transcript of a 4-hour recording. Identify the 8 strongest segments
for a 24-minute vlog. Give me FFmpeg cut commands for each segment."

This layer is about structure, not final creative taste.

Layer 3: Deterministic Cuts (FFmpeg)

FFmpeg handles the boring but critical work: splitting, trimming, concatenating, and preprocessing.

Extract segment by timestamp

ffmpeg -i raw.mp4 -ss 00:12:30 -to 00:15:45 -c copy segment_01.mp4

Batch cut from edit decision list

#!/bin/bash
# cuts.txt: start,end,label
while IFS=, read -r start end label; do
  ffmpeg -i raw.mp4 -ss "$start" -to "$end" -c copy "segments/${label}.mp4"
done < cuts.txt

Concatenate segments

# Create file list
for f in segments/*.mp4; do echo "file '$f'"; done > concat.txt
ffmpeg -f concat -safe 0 -i concat.txt -c copy assembled.mp4

Create proxy for faster editing

ffmpeg -i raw.mp4 -vf "scale=960:-2" -c:v libx264 -preset ultrafast -crf 28 proxy.mp4

Extract audio for transcription

ffmpeg -i raw.mp4 -vn -acodec pcm_s16le -ar 16000 audio.wav

Normalize audio levels

ffmpeg -i segment.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 -c:v copy normalized.mp4

Layer 4: Programmable Composition (Remotion)

Remotion turns editing problems into composable code. Use it for things that traditional editors make painful:

When to use Remotion

Overlays: text, images, branding, lower thirds
Data visualizations: charts, stats, animated numbers
Motion graphics: transitions, explainer animations
Composable scenes: reusable templates across videos
Product demos: annotated screenshots, UI highlights

Basic Remotion composition

import { AbsoluteFill, Sequence, Video, useCurrentFrame } from "remotion";

export const VlogComposition: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <AbsoluteFill>
      {/* Main footage */}
      <Sequence from={0} durationInFrames={300}>
        <Video src="/segments/intro.mp4" />
      </Sequence>

      {/* Title overlay */}
      <Sequence from={30} durationInFrames={90}>
        <AbsoluteFill style={{
          justifyContent: "center",
          alignItems: "center",
        }}>
          <h1 style={{
            fontSize: 72,
            color: "white",
            textShadow: "2px 2px 8px rgba(0,0,0,0.8)",
          }}>
            The AI Editing Stack
          </h1>
        </AbsoluteFill>
      </Sequence>

      {/* Next segment */}
      <Sequence from={300} durationInFrames={450}>
        <Video src="/segments/demo.mp4" />
      </Sequence>
    </AbsoluteFill>
  );
};

Render output

npx remotion render src/index.ts VlogComposition output.mp4

See the Remotion docs for detailed patterns and API reference.

Layer 5: Generated Assets (ElevenLabs / fal.ai)

Generate only what you need. Do not generate the whole video.

Voiceover with ElevenLabs

import os
import requests

resp = requests.post(
    f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your narration text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("voiceover.mp3", "wb") as f:
    f.write(resp.content)

Music and SFX with fal.ai

Use the fal-ai-media skill for:

Background music generation
Sound effects (ThinkSound model for video-to-audio)
Transition sounds

Generated visuals with fal.ai

Use for insert shots, thumbnails, or b-roll that doesn't exist:

generate(app_id: "fal-ai/nano-banana-pro", input_data: {
  "prompt": "professional thumbnail for tech vlog, dark background, code on screen",
  "image_size": "landscape_16_9"
})

VideoDB generative audio

If VideoDB is configured:

voiceover = coll.generate_voice(text="Narration here", voice="alloy")
music = coll.generate_music(prompt="lo-fi background for coding vlog", duration=120)
sfx = coll.generate_sound_effect(prompt="subtle whoosh transition")

Layer 6: Final Polish (Descript / CapCut)

The last layer is human. Use a traditional editor for:

Pacing: adjust cuts that feel too fast or slow
Captions: auto-generated, then manually cleaned
Color grading: basic correction and mood
Final audio mix: balance voice, music, and SFX levels
Export: platform-specific formats and quality settings

This is where taste lives. AI clears the repetitive work. You make the final calls.

Different platforms need different aspect ratios:

Platform	Aspect Ratio	Resolution
YouTube	16:9	1920x1080
TikTok / Reels	9:16	1080x1920
Instagram Feed	1:1	1080x1080
X / Twitter	16:9 or 1:1	1280x720 or 720x720

Reframe with FFmpeg

# 16:9 to 9:16 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih,scale=1080:1920" vertical.mp4

# 16:9 to 1:1 (center crop)
ffmpeg -i input.mp4 -vf "crop=ih:ih,scale=1080:1080" square.mp4

Reframe with VideoDB

from videodb import ReframeMode

# Smart reframe (AI-guided subject tracking)
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

Scene Detection and Auto-Cut

FFmpeg scene detection

# Detect scene changes (threshold 0.3 = moderate sensitivity)
ffmpeg -i input.mp4 -vf "select='gt(scene,0.3)',showinfo" -vsync vfr -f null - 2>&1 | grep showinfo

Silence detection for auto-cut

# Find silent segments (useful for cutting dead air)
ffmpeg -i input.mp4 -af silencedetect=noise=-30dB:d=2 -f null - 2>&1 | grep silence

Highlight extraction

Use Claude to analyze transcript + scene timestamps:

"Given this transcript with timestamps and these scene change points,
identify the 5 most engaging 30-second clips for social media."

What Each Tool Does Best

Tool	Strength	Weakness
Claude / Codex	Organization, planning, code generation	Not the creative taste layer
FFmpeg	Deterministic cuts, batch processing, format conversion	No visual editing UI
Remotion	Programmable overlays, composable scenes, reusable templates	Learning curve for non-devs
Screen Studio	Polished screen recordings immediately	Only screen capture
ElevenLabs	Voice, narration, music, SFX	Not the center of the workflow
Descript / CapCut	Final pacing, captions, polish	Manual, not automatable

Key Principles

Edit, don't generate. This workflow is for cutting real footage, not creating from prompts.
Structure before style. Get the story right in Layer 2 before touching anything visual.
FFmpeg is the backbone. Boring but critical. Where long footage becomes manageable.
Remotion for repeatability. If you'll do it more than once, make it a Remotion component.
Generate selectively. Only use AI generation for assets that don't exist, not for everything.
Taste is the last layer. AI clears repetitive work. You make the final creative calls.

fal-ai-media — AI image, video, and audio generation
videodb — Server-side video processing, indexing, and streaming
content-engine — Platform-native content distribution

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

87/100

Grade

A

Excellent

Safety

88

Quality

89

Clarity

92

Completeness

78

Summary

This skill guides AI agents through a structured video editing pipeline for cutting, organizing, and augmenting real footage. It layers six specialized tools—transcription/planning, FFmpeg, Remotion, ElevenLabs/fal.ai, and Descript—to compress and structure existing video rather than generate it from scratch. The workflow emphasizes deterministic cuts and reusable templates over manual editing.

Detected Capabilities

FFmpeg batch cutting and segment extractionAudio extraction, normalization, and silence detectionRemotion programmatic composition and overlay renderingElevenLabs voiceover generationScene detection and auto-cut analysisVideo transcription and structural planningSocial media aspect ratio conversionVideoDB smart reframing and generative audio

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

cut video footagecreate vlogedit screen recordingextract video clipsadd voiceoversocial media reframevideo batch processingtranscribe and edit

Risk Signals

INFO

ElevenLabs API key accessed via environment variable (os.environ["ELEVENLABS_API_KEY"])

Layer 5: Generated Assets, Voiceover with ElevenLabs code block

INFO

Network request to api.elevenlabs.io for voiceover generation

Layer 5: Generated Assets, ElevenLabs voiceover code

INFO

Bash script with while loop reading from cuts.txt file without input validation

Layer 3: Deterministic Cuts, Batch cut section

WARNING

FFmpeg command uses user-supplied filenames without explicit escaping in concat.txt

Layer 3: Deterministic Cuts, Concatenate segments section

Referenced Domains

External domains referenced in skill content, detected by static analysis.

api.elevenlabs.iowww.remotion.dev

Use Cases

Cut long recordings into short-form content (vlogs, tutorials, demos)
Extract and organize clips from raw footage using transcripts and timestamps
Add overlays, subtitles, and voiceovers to existing video
Reframe video for different social platforms (YouTube, TikTok, Instagram)
Generate edit decision lists and automate batch cutting with FFmpeg
Build reusable video templates with Remotion for branded content

Quality Notes

Excellent structure: six-layer pipeline is clearly documented with specific tool responsibilities
Each layer has concrete code examples and bash commands that agents can execute directly
Principles section at the end reinforces the skill's core thesis (edit, don't generate) and prevents misuse
Comparison table shows strengths/weaknesses of each tool, helping agents decide when to use what
Well-scoped to video editing workflows—explicitly excludes video generation from prompts
References related skills (fal-ai-media, videodb, content-engine) for context and avoiding duplication
Examples include both simple operations (extract segment) and complex patterns (Remotion composition)
Social media reframing section addresses platform-specific requirements with concrete FFmpeg and VideoDB patterns
Scene detection and highlight extraction guidance bridges AI analysis (Claude) with deterministic processing (FFmpeg)
Could improve error handling guidance—what should agents do if FFmpeg fails or API rate limits hit?

Model: claude-haiku-4-5-20251001Analyzed: Apr 20, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest

v1.0

Seeded from github.com/affaan-m/everything-claude-code

2026-03-16

video-editing

Video Editing

When to Activate

Core Thesis

The Pipeline

Layer 1: Capture (Screen Studio / Raw Footage)

Layer 2: Organization (Claude / Codex)

Layer 3: Deterministic Cuts (FFmpeg)

Extract segment by timestamp

Batch cut from edit decision list

Concatenate segments

Create proxy for faster editing

Extract audio for transcription

Normalize audio levels

Layer 4: Programmable Composition (Remotion)

When to use Remotion

Basic Remotion composition

Render output

Layer 5: Generated Assets (ElevenLabs / fal.ai)

Voiceover with ElevenLabs

Music and SFX with fal.ai

Generated visuals with fal.ai

VideoDB generative audio

Layer 6: Final Polish (Descript / CapCut)

Social Media Reframing

Reframe with FFmpeg

Reframe with VideoDB

Scene Detection and Auto-Cut

FFmpeg scene detection

Silence detection for auto-cut

Highlight extraction

What Each Tool Does Best

Key Principles

Related Skills

Summary

Detected Capabilities

Trigger Keywords

Risk Signals

Referenced Domains

Use Cases

Quality Notes

Reviews

Version History

Command Palette