Models/ByteDance/SD 2.0

Affordable SD 2 API — ByteDance Audio-Video Generation with Director Control

Live37 variants

SD 2 (ByteDance Seedance 2) is a unified audio-video generation model that produces cinematic clips from text or images with native audio, phoneme-accurate lip-sync in 8+ languages, and up to 1080p resolution. Choose from 26+ variants across Global, Chinese, and VIP tiers — from draft-quality 480p at $0.09/sec to full-HD 1080p at $0.4725/sec.

⚡ VIPFast queue + low censorship17 models
T2VVIP
⚡ VIP

SD 2 VIP Text to Video

VIP variant — fast queue and low censorship. Priority compute for text-to-video generation.

720p
$0.30/sec
Try Model
T2VVIP
⚡ VIP

SD 2 VIP Text to Video Fast

VIP fast variant — quickest T2V with low censorship and priority queue.

720p
Fast
$0.21/sec
Try Model
I2VVIP
⚡ VIP

SD 2 VIP Image to Video

VIP variant — fast queue and low censorship. Priority image-to-video for high-volume workflows.

720p
$0.30/sec
Try Model
I2VVIP
⚡ VIP

SD 2 VIP Image to Video Fast

VIP fast variant — fastest image-to-video available with low censorship.

720p
Fast
$0.21/sec
Try Model
FLFVIP
⚡ VIP

SD 2 VIP First & Last Frame

VIP variant — fast queue and low censorship. Priority first/last-frame interpolation.

720p
$0.30/sec
Try Model
FLFVIP
⚡ VIP

SD 2 VIP First & Last Frame Fast

VIP fast variant — fastest first/last-frame interpolation with low censorship.

720p
Fast
$0.21/sec
Try Model
ReferenceVIP
⚡ VIP

SD 2 VIP Omni Reference

VIP variant — fast queue and low censorship. Priority reference-guided video generation.

720p
$0.30/sec
Try Model
ReferenceVIP
⚡ VIP

SD 2 VIP Omni Reference Fast

VIP fast variant — quickest reference-guided option with low censorship.

720p
Fast
$0.21/sec
Try Model
T2VVIP
⚡ VIP

SD 2 VIP Text to Video 1080p

VIP 1080p variant — full HD text-to-video with priority queue and low censorship.

1080p
$0.675/sec
Try Model
T2VVIP
⚡ VIP

SD 2 VIP Text to Video 1080p Fast

VIP 1080p fast variant — high-speed full HD text-to-video with priority queue and low censorship.

1080p
Fast
$0.4725/sec
Try Model
I2VVIP
⚡ VIP

SD 2 VIP Image to Video 1080p

VIP 1080p variant — animate images to full HD video with priority queue and low censorship.

1080p
$0.675/sec
Try Model
I2VVIP
⚡ VIP

SD 2 VIP Image to Video 1080p Fast

VIP 1080p fast variant — fastest full HD image animation with priority queue and low censorship.

1080p
Fast
$0.4725/sec
Try Model
ReferenceVIP
⚡ VIP

SD 2 VIP Omni Reference 1080p

VIP 1080p variant — full HD reference-guided generation with up to 9 images, 3 videos, and 3 audio clips. Priority queue and low censorship.

1080p
$0.675/sec
Try Model
ReferenceVIP
⚡ VIP

SD 2 VIP Omni Reference 1080p Fast

VIP 1080p fast variant — fastest full HD reference-guided generation with priority queue and low censorship.

1080p
Fast
$0.4725/sec
Try Model
FLFVIP
⚡ VIP

SD 2 VIP First & Last Frame 1080p

VIP 1080p variant — full HD first/last-frame interpolation with priority queue and low censorship.

1080p
$0.675/sec
Try Model
EditVIP
⚡ VIP

SD 2 VIP Extend Video

VIP extend variant — fast queue and low censorship. Continues an existing SD 2.0 video at 720p while preserving visual style, motion, characters, and audio consistency.

720p
$0.21–$0.30/sec
Try Model
EditVIP
⚡ VIP

SD 2 VIP Extend Video 1080p

VIP 1080p extend variant — full HD continuation of an existing SD 2.0 video with priority queue and low censorship. Preserves style, motion, and audio consistency.

1080p
$0.4725–$0.675/sec
Try Model
🌐 GlobalSlightly higher censorship10 models
T2VNew
🌐 Global

SD 2 Text to Video

Global variant. Generate cinematic videos from text prompts with precise motion and photorealistic quality.

720p
$0.25/sec
Try Model
T2VFast
🌐 Global

SD 2 Text to Video Fast

Global fast variant. Reduced latency text-to-video, ideal for rapid iteration.

720p
Fast
$0.15/sec
Try Model
I2VNew
🌐 Global

SD 2 Image to Video

Global variant. Animate any image into a smooth, realistic video clip with full motion control.

720p
$0.25/sec
Try Model
I2VFast
🌐 Global

SD 2 Image to Video Fast

Global fast variant. High-throughput image animation without sacrificing visual fidelity.

720p
Fast
$0.15/sec
Try Model
FLFNew
🌐 Global

SD 2 First & Last Frame

Global variant. Provide a start and end frame; seamlessly interpolates a fluid video between them.

720p
$0.25/sec
Try Model
FLFFast
🌐 Global

SD 2 First & Last Frame Fast

Global fast variant. Same smooth first-to-last transitions at significantly lower latency.

720p
Fast
$0.15/sec
Try Model
ReferenceNew
🌐 Global

SD 2 Omni Reference

Global variant. Use a reference image to guide character appearance across the generated video. No source video required.

720p
$0.30/sec
Try Model
ReferenceFast
🌐 Global

SD 2 Omni Reference Fast

Global fast variant. Maintain consistent character identity across scenes at high speed.

720p
Fast
$0.21/sec
Try Model
EditUtility
🌐 Global

SD 2.0 Watermark Remover

Remove SD 2.0 watermarks via AI inpainting. Flat $0.025 for clips up to 5s, then $0.005/sec beyond that.

Native
Fast
$0.025 / $0.005/sec
Try Model
EditNew
🌐 Global

SD 2 Watermark Remover Pro

Pro watermark removal for SD 2.0 videos. Flat $0.065 for clips up to 5s, then $0.013/sec beyond that.

Native
Fast
$0.065 / $0.013/sec
Try Model
🇨🇳 ChineseLow censorship8 models
T2VNew
🇨🇳 Chinese

SD 2.0 Text to Video 480p

Chinese SD 2 text-to-video at 480p resolution. Lower cost with low censorship — ideal for draft generation and rapid iteration.

480p
$0.09–$0.15/sec
Try Model
I2VNew
🇨🇳 Chinese

SD 2.0 Image to Video 480p

Chinese SD 2 image-to-video at 480p resolution. Animate images at lower cost with low censorship.

480p
$0.09–$0.15/sec
Try Model
ReferenceNew
🇨🇳 Chinese

SD 2.0 Omni Reference 480p

Chinese SD 2 omni reference at 480p. Reference-guided generation at lower cost with low censorship.

480p
$0.18/sec
Try Model
T2VNew
🇨🇳 Chinese

SD v2.0 Text to Video

Chinese SD 2 text-to-video with low censorship. Generate cinematic videos from text prompts.

720p
$0.15–$0.25/sec
Try Model
I2VNew
🇨🇳 Chinese

SD v2.0 Image to Video

Chinese SD 2 image-to-video with low censorship. Animate any image into a smooth, realistic video clip.

720p
$0.15–$0.25/sec
Try Model
ReferenceNew
🇨🇳 Chinese

SD v2.0 Omni Reference

Chinese variant with low censorship. Reference-guided generation using a source video + image for strong character consistency.

720p
$0.30/sec
Try Model
EditNew
🇨🇳 Chinese

SD v2.0 Extend Video

Seamlessly continue an existing SD 2.0 video. Preserves visual style, motion, characters, and audio consistency across the new segment.

720p
$0.15–$0.25/sec
Try Model
EditNew
🇨🇳 Chinese

SD v2.0 Video Edit

Edit existing videos using text prompts and optional reference images. Billing based on input video duration (max 15s).

720p
$0.25–$0.375/sec
Try Model
🎭 Face TrainingCharacter & reference training2 models
Face TrainingNew
🎭 Face Training

SD 2 Character

Generate a reusable character sheet from reference images. Use the returned character ID in any SD 2 Omni Reference prompt.

720p
$0.18 flat
Try Model
Face TrainingNew
🎭 Face Training

SD 2 Omni Reference Train

Train a custom omni reference model on your subject. Returns a character ID for consistent identity across any scene or prompt.

720p
$0.50 flat
Try Model

What is ByteDance SD 2?

ByteDance SD 2 — officially called Seedance 2 — is a next-generation unified audio-video generation model released by ByteDance. Unlike earlier video models that generate silent clips, SD 2 natively synthesises audio alongside every video: ambient soundscapes, foley, and phoneme-accurate lip-sync in over eight languages are baked into the generation pipeline. The model supports text-to-video, image-to-video, first/last-frame interpolation, omni reference (multi-image character consistency), and multi-shot director scripting — all from a single model family spanning 26+ API variants.

Muapi exposes the full SD 2 variant catalogue through a single REST API. You get one unified endpoint pattern, one API key, and pay-as-you-go pricing starting from $0.09/sec — no separate accounts for Chinese vs. Global vs. VIP tiers, no infrastructure to manage. Submit a request, poll for the result, and download your video.

Key Capabilities

Multi-Modal Reference

Up to 9 reference images, 3 reference videos, and 3 audio clips for Omni Reference generation — maintain character identity and scene style across every shot.

Native Audio Generation

Phoneme-accurate lip-sync in 8+ languages and ambient audio generation built in — no separate TTS or audio-sync step required.

First & Last Frame Control

Define your start and end frames; SD 2 interpolates a seamless, motion-consistent video between them for precise narrative control.

Director-Mode Scripting

Multi-shot storyboarding with scene transitions and camera directions — compose complex sequences from a single structured prompt.

1080p VIP Resolution

Full HD output with priority queue and low censorship on VIP variants — ideal for production-ready content at $0.4725/sec.

7 Aspect Ratios

Portrait, landscape, and square outputs for any platform or screen — vertical for Reels/Shorts, widescreen for YouTube, square for social feeds.

Which SD 2 Tier Is Right for You?

TierCensorshipResolutionSpeedStarting PriceBest For
🇨🇳 ChineseLow480p – 720pStandard$0.09/secCost-efficient drafts and creative iteration
🌐 GlobalModerate720pStandard / Fast$0.15/secBalanced quality for international audiences
⚡ VIP 720pLow720pStandard / Fast$0.21/secPriority queue for high-volume production
⚡ VIP 1080pLow1080pStandard / Fast$0.4725/secFull HD deliverables with priority throughput
🎭 Face Training$0.18 flatReusable character sheets for consistent identity

Frequently Asked Questions

What is SD 2 (Seedance 2)?

SD 2, also known as Seedance 2 or ByteDance SD 2.0, is ByteDance's unified audio-video generation model. It produces cinematic video from text prompts or images with native audio, phoneme-accurate lip-sync in 8+ languages, multi-shot director control, and up to 1080p resolution. Muapi offers 26+ SD 2 variants across Global, Chinese, and VIP tiers.

What is the difference between SD 2 Global, Chinese, and VIP tiers?

Global variants offer balanced quality with moderate content filtering. Chinese variants have lower censorship thresholds and are priced from $0.09/sec. VIP variants provide a fast priority queue, lower censorship, and 1080p resolution support, priced from $0.21/sec. All tiers support text-to-video, image-to-video, omni reference, and first/last frame generation.

What is SD 2 Omni Reference?

SD 2 Omni Reference lets you provide up to 9 reference images, 3 reference videos, and 3 audio clips to guide character appearance, scene style, and audio in the generated video. It produces highly consistent results across multiple shots without fine-tuning.

How much does the SD 2 API cost?

SD 2 API pricing on Muapi starts from $0.09/sec for Chinese 480p variants. Global variants start at $0.15/sec, VIP standard variants at $0.21/sec, and VIP 1080p variants at $0.4725/sec. Face training (character sheet generation) is a flat $0.18 per request.

Can I get an API key for SD 2 right now?

Yes. SD 2 API variants are live on Muapi. Sign up at muapi.ai, create an API key in your dashboard, and start generating with the SD 2 text-to-video, image-to-video, or omni reference endpoints immediately.