Catalog
affaan-m/fal-ai-media

affaan-m

fal-ai-media

Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.

global
New~1.7k
v1.1Saved May 11, 2026

fal.ai Media Generation

Generate images, videos, and audio using fal.ai models via MCP.

When to Activate

  • User wants to generate images from text prompts
  • Creating videos from text or images
  • Generating speech, music, or sound effects
  • Any media generation task
  • User says "generate image", "create video", "text to speech", "make a thumbnail", or similar

MCP Requirement

fal.ai MCP server must be configured. Add to ~/.claude.json:

"fal-ai": {
  "command": "npx",
  "args": ["-y", "fal-ai-mcp-server"],
  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}

Get an API key at fal.ai.

MCP Tools

The fal.ai MCP provides these tools:

  • search — Find available models by keyword
  • find — Get model details and parameters
  • generate — Run a model with parameters
  • result — Check async generation status
  • status — Check job status
  • cancel — Cancel a running job
  • estimate_cost — Estimate generation cost
  • models — List popular models
  • upload — Upload files for use as inputs

Image Generation

Nano Banana 2 (Fast)

Best for: quick iterations, drafts, text-to-image, image editing.

generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "seed": 42
  }
)

Nano Banana Pro (High Fidelity)

Best for: production images, realism, typography, detailed prompts.

generate(
  app_id: "fal-ai/nano-banana-pro",
  input_data: {
    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
    "image_size": "square",
    "num_images": 1,
    "guidance_scale": 7.5
  }
)

Common Image Parameters

Param Type Options Notes
prompt string required Describe what you want
image_size string square, portrait_4_3, landscape_16_9, portrait_16_9, landscape_4_3 Aspect ratio
num_images number 1-4 How many to generate
seed number any integer Reproducibility
guidance_scale number 1-20 How closely to follow the prompt (higher = more literal)

Image Editing

Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:

# First upload the source image
upload(file_path: "/path/to/image.png")

# Then generate with image input
generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "same scene but in watercolor style",
    "image_url": "<uploaded_url>",
    "image_size": "landscape_16_9"
  }
)

Video Generation

Seedance 1.0 Pro (ByteDance)

Best for: text-to-video, image-to-video with high motion quality.

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "seed": 42
  }
)

Kling Video v3 Pro

Best for: text/image-to-video with native audio generation.

generate(
  app_id: "fal-ai/kling-video/v3/pro",
  input_data: {
    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
    "duration": "5s",
    "aspect_ratio": "16:9"
  }
)

Veo 3 (Google DeepMind)

Best for: video with generated sound, high visual quality.

generate(
  app_id: "fal-ai/veo-3",
  input_data: {
    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
    "aspect_ratio": "16:9"
  }
)

Image-to-Video

Start from an existing image:

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "camera slowly zooms out, gentle wind moves the trees",
    "image_url": "<uploaded_image_url>",
    "duration": "5s"
  }
)

Video Parameters

Param Type Options Notes
prompt string required Describe the video
duration string "5s", "10s" Video length
aspect_ratio string "16:9", "9:16", "1:1" Frame ratio
seed number any integer Reproducibility
image_url string URL Source image for image-to-video

Audio Generation

CSM-1B (Conversational Speech)

Text-to-speech with natural, conversational quality.

generate(
  app_id: "fal-ai/csm-1b",
  input_data: {
    "text": "Hello, welcome to the demo. Let me show you how this works.",
    "speaker_id": 0
  }
)

ThinkSound (Video-to-Audio)

Generate matching audio from video content.

generate(
  app_id: "fal-ai/thinksound",
  input_data: {
    "video_url": "<video_url>",
    "prompt": "ambient forest sounds with birds chirping"
  }
)

ElevenLabs (via API, no MCP)

For professional voice synthesis, use ElevenLabs directly:

import os
import requests

resp = requests.post(
    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("output.mp3", "wb") as f:
    f.write(resp.content)

VideoDB Generative Audio

If VideoDB is configured, use its generative audio:

# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")

# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)

# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")

Cost Estimation

Before generating, check estimated cost:

estimate_cost(
  estimate_type: "unit_price",
  endpoints: {
    "fal-ai/nano-banana-pro": {
      "unit_quantity": 1
    }
  }
)

Model Discovery

Find models for specific tasks:

search(query: "text to video")
find(endpoint_ids: ["fal-ai/seedance-1-0-pro"])
models()

Tips

  • Use seed for reproducible results when iterating on prompts
  • Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
  • For video, keep prompts descriptive but concise — focus on motion and scene
  • Image-to-video produces more controlled results than pure text-to-video
  • Check estimate_cost before running expensive video generations
  • videodb — Video processing, editing, and streaming
  • video-editing — AI-powered video editing workflows
  • content-engine — Content creation for social platforms
Files1
1 files · 1.0 KB

Select a file to preview

Overall Score

78/100

Grade

B

Good

Safety

75

Quality

82

Clarity

80

Completeness

72

Summary

This skill enables unified media generation (images, videos, and audio) using fal.ai models through an MCP server. It documents text-to-image (Nano Banana 2/Pro), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound) capabilities, plus fallback integrations with ElevenLabs and VideoDB. The skill assumes the fal.ai MCP is configured and provides detailed model parameters, cost estimation, and workflow examples.

Detected Capabilities

MCP tool invocation (generate, search, find, upload, estimate_cost, status, result, cancel)API requests to external services (fal.ai, ElevenLabs)File upload for media inputsFile write (saving output audio/media)Environment variable access (FAL_KEY, ELEVENLABS_API_KEY)

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

generate image text promptcreate video aitext to speechimage to videomedia generation falvideo audio synthesisthumbnail creation aispeech synthesis

Risk Signals

INFO

FAL_KEY placeholder in ~/.claude.json configuration example

MCP Requirement section
INFO

ELEVENLABS_API_KEY accessed from environment

ElevenLabs audio generation section
INFO

File write to local paths (output.mp3)

ElevenLabs example code
INFO

Outbound requests to api.elevenlabs.io

ElevenLabs integration section

Referenced Domains

External domains referenced in skill content, detected by static analysis.

api.elevenlabs.iofal.ai

Use Cases

  • User requests AI-generated images from text descriptions
  • Creating videos from text or image inputs with multiple model options
  • Generating natural speech from text or ambient audio from video content
  • Iterating on creative media projects with cost estimation before generation
  • Switching between fast (Nano Banana 2) and high-fidelity (Pro) models based on project stage

Quality Notes

  • Clear activation conditions and use cases at the top
  • Comprehensive model reference with parameter tables (image, video, audio)
  • Well-organized by media type with separate sections
  • Practical examples provided for each generation task
  • Cost estimation workflow documented upfront
  • Model discovery tools explained
  • Related skills cross-referenced
  • Tips section with practical guidance (seed for reproducibility, cost iteration, prompt tuning)
  • ElevenLabs and VideoDB fallbacks documented for users without native fal.ai support
  • Workflow examples show both MCP tool calls and Python/direct API code, accommodating different integration styles
Model: claude-haiku-4-5-20251001Analyzed: May 11, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-04-20

Latest
v1.0

Seeded from github.com/affaan-m/everything-claude-code

2026-03-16

Add affaan-m/fal-ai-media to your library

Command Palette

Search for a command to run...

affaan-m/fal-ai-media | SkillRepo