fal.ai Media Generation

Drift-prone skill. fal.ai model IDs, pricing, inputs, and MCP tool names change quickly. Search or fetch the current model metadata before promising a specific model, parameter, output format, or cost.

Generate images, videos, and audio using fal.ai models via MCP.

When to Activate

User wants to generate images from text prompts
Creating videos from text or images
Generating speech, music, or sound effects
Any media generation task
User says "generate image", "create video", "text to speech", "make a thumbnail", or similar

MCP Requirement

fal.ai MCP server must be configured. Add to ~/.claude.json:

"fal-ai": {
  "command": "npx",
  "args": ["-y", "fal-ai-mcp-server"],
  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
}

Get an API key at fal.ai.

MCP Tools

The fal.ai MCP provides these tools:

search — Find available models by keyword
find — Get model details and parameters
generate — Run a model with parameters
result — Check async generation status
status — Check job status
cancel — Cancel a running job
estimate_cost — Estimate generation cost
models — List popular models
upload — Upload files for use as inputs

Image Generation

Nano Banana 2 (Fast)

Best for: quick iterations, drafts, text-to-image, image editing.

generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "seed": 42
  }
)

Nano Banana Pro (High Fidelity)

Best for: production images, realism, typography, detailed prompts.

generate(
  app_id: "fal-ai/nano-banana-pro",
  input_data: {
    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
    "image_size": "square",
    "num_images": 1,
    "guidance_scale": 7.5
  }
)

Common Image Parameters

Param	Type	Options	Notes
`prompt`	string	required	Describe what you want
`image_size`	string	`square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3`	Aspect ratio
`num_images`	number	1-4	How many to generate
`seed`	number	any integer	Reproducibility
`guidance_scale`	number	1-20	How closely to follow the prompt (higher = more literal)

Image Editing

Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:

# First upload the source image
upload(file_path: "/path/to/image.png")

# Then generate with image input
generate(
  app_id: "fal-ai/nano-banana-2",
  input_data: {
    "prompt": "same scene but in watercolor style",
    "image_url": "<uploaded_url>",
    "image_size": "landscape_16_9"
  }
)

Video Generation

Seedance 1.0 Pro (ByteDance)

Best for: text-to-video, image-to-video with high motion quality.

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "seed": 42
  }
)

Kling Video v3 Pro

Best for: text/image-to-video with native audio generation.

generate(
  app_id: "fal-ai/kling-video/v3/pro",
  input_data: {
    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
    "duration": "5s",
    "aspect_ratio": "16:9"
  }
)

Veo 3 (Google DeepMind)

Best for: video with generated sound, high visual quality.

generate(
  app_id: "fal-ai/veo-3",
  input_data: {
    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
    "aspect_ratio": "16:9"
  }
)

Image-to-Video

Start from an existing image:

generate(
  app_id: "fal-ai/seedance-1-0-pro",
  input_data: {
    "prompt": "camera slowly zooms out, gentle wind moves the trees",
    "image_url": "<uploaded_image_url>",
    "duration": "5s"
  }
)

Video Parameters

Param	Type	Options	Notes
`prompt`	string	required	Describe the video
`duration`	string	`"5s"`, `"10s"`	Video length
`aspect_ratio`	string	`"16:9"`, `"9:16"`, `"1:1"`	Frame ratio
`seed`	number	any integer	Reproducibility
`image_url`	string	URL	Source image for image-to-video

Audio Generation

CSM-1B (Conversational Speech)

Text-to-speech with natural, conversational quality.

generate(
  app_id: "fal-ai/csm-1b",
  input_data: {
    "text": "Hello, welcome to the demo. Let me show you how this works.",
    "speaker_id": 0
  }
)

ThinkSound (Video-to-Audio)

Generate matching audio from video content.

generate(
  app_id: "fal-ai/thinksound",
  input_data: {
    "video_url": "<video_url>",
    "prompt": "ambient forest sounds with birds chirping"
  }
)

ElevenLabs (via API, no MCP)

For professional voice synthesis, use ElevenLabs directly:

import os
import requests

resp = requests.post(
    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
    headers={
        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
        "Content-Type": "application/json"
    },
    json={
        "text": "Your text here",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
    }
)
with open("output.mp3", "wb") as f:
    f.write(resp.content)

VideoDB Generative Audio

If VideoDB is configured, use its generative audio:

# Voice generation
audio = coll.generate_voice(text="Your narration here", voice="alloy")

# Music generation
music = coll.generate_music(prompt="upbeat electronic background music", duration=30)

# Sound effects
sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")

Cost Estimation

Before generating, check estimated cost:

estimate_cost(
  estimate_type: "unit_price",
  endpoints: {
    "fal-ai/nano-banana-pro": {
      "unit_quantity": 1
    }
  }
)

Model Discovery

Find models for specific tasks:

search(query: "text to video")
find(endpoint_ids: ["fal-ai/seedance-1-0-pro"])
models()

Tips

Use seed for reproducible results when iterating on prompts
Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
For video, keep prompts descriptive but concise — focus on motion and scene
Image-to-video produces more controlled results than pure text-to-video
Check estimate_cost before running expensive video generations

videodb — Video processing, editing, and streaming
video-editing — AI-powered video editing workflows
content-engine — Content creation for social platforms

Files1

1 files · 1.0 KB

Select a file to preview

Overall Score

79/100

Grade

B

Good

Safety

78

Quality

82

Clarity

88

Completeness

72

Summary

This skill teaches agents how to generate images, videos, and audio using fal.ai models via an MCP server. It covers text-to-image (Nano Banana 2/Pro), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound), with supplementary guidance on ElevenLabs and VideoDB alternatives. The skill includes practical examples, parameter documentation, cost estimation patterns, and model discovery workflows.

Detected Capabilities

MCP server configurationModel parameter documentationAPI calls via MCP tools (generate, search, find, estimate_cost)File upload for media inputsCost estimationAsync job status checkingPython HTTP requests to external APIs (ElevenLabs example)File write (MP3 output in Python example)

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

generate imagecreate videotext to speechvideo generationmedia synthesistext-to-videoimage from textai audio generation

Risk Signals

WARNING

ElevenLabs API key access via environment variable (ELEVENLABS_API_KEY)

Audio Generation / ElevenLabs section

WARNING

HTTP POST to api.elevenlabs.io with credentials in headers

Audio Generation / ElevenLabs section, Python example

INFO

Skill marked as 'Drift-prone' — model IDs, pricing, and parameters change frequently

Skill header note

Referenced Domains

External domains referenced in skill content, detected by static analysis.

api.elevenlabs.iofal.ai

Use Cases

Generate images from text descriptions using Nano Banana 2 or Pro models
Create videos from text prompts or source images with Seedance, Kling, or Veo 3
Convert text to speech with conversational quality using CSM-1B
Generate matching audio or sound effects from video content with ThinkSound
Estimate generation costs before running expensive media operations
Discover and compare available fal.ai models for specific tasks
Integrate AI media generation into content creation workflows

Quality Notes

Skill clearly marks itself as drift-prone, warning users to verify model metadata before deployment — excellent transparency about maintenance burden
Comprehensive parameter tables for images and videos with clear descriptions and valid options
Well-organized by media type (image, video, audio) with subsections for each model
Practical examples use realistic prompts and show complete MCP tool invocations with parameter values
Cost estimation guidance and model discovery section help users avoid expensive mistakes
MCP configuration instructions are clear and include where to obtain API keys
Related skills section links to complementary tools (videodb, video-editing, content-engine)
Alternative approaches documented (ElevenLabs, VideoDB) provide flexibility
Tips section offers actionable guidance on reproducibility, cost optimization, and prompt engineering
ElevenLabs section includes a complete working Python example, making it easy to adapt

Model: claude-haiku-4-5-20251001Analyzed: Jul 14, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.2

Content updated

2026-07-14

Latest

v1.1