Explore/muapi.ai/omnihuman-1-5

muapi/omnihuman-1-5

Audio to Video

Generate realistic talking head video from portrait image and audio using KIE OmniHuman 1.5.

Result

Price varies by resolution per second of audio (max 60s billed)

Resolution	Duration	Cost
720	5s	$0.22
720	10s	$0.45
720	30s	$1.35
720	60s	$2.70
1080	5s	$0.30
1080	10s	$0.60
1080	30s	$1.80
1080	60s	$3.60

🚀Related Models

View all

creatify-lipsync

Realistic lipsync video - optimized for speed, quality, and consistency.

Audio to Video

kling-v1-avatar-standard

Kling AI Avatar Standard creates talking avatar videos from a single image + audio input. It supports realistic humans, animals, or stylized characters, producing lip-synced avatar videos easily.

Audio to Video

veed-lipsync

Generate realistic lipsync from any audio using VEED's latest model

Audio to Video

infinitetalk-image-to-video

InfiniteTalk Image-to-Video brings still portraits and character photos to life by generating natural, realistic talking videos. You provide a single face image and a dialogue script, and the model animates lip movement, facial expressions, and subtle head gestures to match the speech.

Audio to Video

kling-v2-avatar-pro

AI-Avatar v2 Pro takes a reference image of a person/character and an audio dialogue clip, then generates a realistic talking-avatar video. It preserves identity, lip syncs accurately to the audio, adds natural head movement, eye motion, expressions, and cinematic lighting.

Audio to Video

latent-sync

LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.

Audio to Video

kling-v1-avatar-pro

Kling AI Avatar Pro is the premium tier for making high-quality talking avatars. You upload a character image plus an audio file, and the model generates a realistic avatar video with lip-sync.

Audio to Video

ltx-2-19b-lipsync

LTX-2-19B LipSync generates a realistic talking video by synchronizing a person’s mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency. Ideal for avatars, dubbing, dialogue replacement, and character narration.

Audio to Video

ltx-2.3-lipsync

LTX-2.3 LipSync generates a realistic talking video by synchronizing mouth movements to an input audio clip. It preserves facial identity, head position, lighting, and natural expressions while producing accurate lip motion, subtle blinking, and stable temporal consistency—powered by the upgraded LTX-2.3 architecture.

Audio to Video

sync-lipsync

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.

Audio to Video

wan2.2-speech-to-video

WAN2.2 Speech-to-Video transforms a static image into a talking video by synchronizing lip movements and facial expressions with an audio input. Simply provide a character image along with a speech dialogue, and the model generates a natural, expressive video where the subject speaks your lines.

Audio to Video

kling-v2-avatar-standard

AI-Avatar v2 Standard generates a talking-avatar video from a reference image and an audio dialogue. It performs accurate lip-sync, natural facial expressions, subtle head motion, blinking, and light emotional cues based on voice tone. This Standard version focuses on speed and natural realism.

Audio to Video

📝

Overview

About this model

OmniHuman 1.5 is a state-of-the-art lipsync and talking head model that animates a portrait image using an input audio track. It achieves high fidelity, realistic lip-syncing, natural facial expressions, and fluid head movements to create realistic speaking or singing videos.

1Digital Avatars: Create realistic talking avatars for virtual presentations and customer engagement.

2Entertainment: Animate portrait photos and artwork to sing or speak with natural lip sync.

3Social Media: Generate engaging video clips with animated characters matching voiceovers.

💰

Pricing & Value

Cost analysis

Provider	Cost	Notes
muapiapp	$0.045/sec (720p) / $0.060/sec (1080p)	Dynamic per-second billing based on audio duration (5s to 60s).
Fal.ai	Not available	Not available
Replicate	Not available	Not available

muapiapp$0.045/sec (720p) / $0.060/sec (1080p)

Dynamic per-second billing based on audio duration (5s to 60s).

Fal.aiNot available

Not available

ReplicateNot available

Not available

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Parameter	Type	Description	Default
Prompt	string	Optional prompt to guide lipsync style.	`Make her sing confidently into a microphone with natural lip sync`
Image URL	string	URL of the input portrait image.	`https://cdn.muapi.ai/assets/omnihuman-1-5.jpg`
Audio URL	string	URL of the input audio track.	`https://cdn.muapi.ai/assets/omnihuman-1-5.mp3`
Output Resolution	Enum (2 options)	Output video resolution.	`1080`
Fast Mode	boolean	Enable fast generation mode.	`false`

Promptstring

Optional prompt to guide lipsync style.

Default ValueMake her sing confidently into a microphone with natural lip sync

Image URLstring

URL of the input portrait image.

Default Valuehttps://cdn.muapi.ai/assets/omnihuman-1-5.jpg

Audio URLstring

URL of the input audio track.

Default Valuehttps://cdn.muapi.ai/assets/omnihuman-1-5.mp3

Output ResolutionEnum (2 options)

Output video resolution.

Default Value1080

Fast Modeboolean

Enable fast generation mode.

Default Valuefalse

📖

Implementation Guide

Developer documentation

How to Use OmniHuman 1.5

Prepare Your Inputs:
- Image URL: Select a clear, high-quality portrait image. Upload or provide its URL.
- Audio URL: Provide an audio file containing the speech or song that will drive the avatar's lip-sync.
Configure Parameters:
- Prompt: Optionally add a prompt to guide the style of the lip sync (e.g., 'Make her sing confidently with natural lip sync').
- Output Resolution: Choose between 720p or 1080p. The default is 1080p.
- Fast Mode: Enable fast generation mode (pe_fast_mode) for quicker iterations.
- Seed: Use a custom seed for reproducible results, or set to -1 for random.
Submit Your Request:
- Send your prepared inputs to the omnihuman-1-5 endpoint as defined in the technical schema.
Receive and Review the Output:
- Once processing completes, retrieve the output video URL. Review the generated animation and iterate as needed.

❓

Common Questions

Frequently asked

What is the maximum duration of generation?

The generation duration is determined by the input audio track length, with a minimum of 5 seconds and a maximum of 60 seconds.

What parameters are required?

Both `image_url` and `audio_url` are mandatory parameters to generate a talking head video.

How is the credit cost calculated?

Cost is calculated dynamically per second of the input audio's duration: $0.045 per second for 720p resolution and $0.06 per second for 1080p resolution.

ai-product-photography

wan2.2-image-to-video

facebook-publish

hunyuan-text-to-video

runway-aleph-v2v

flux-dev-lora

happy-horse-1.1-text-to-video-1080p

pixverse-v4.5-t2v

hidream-i1-full

creatify-lipsync

flux-kontext-pro-i2i

kling-v1-avatar-standard

heygen-video-translate

wan2.2-animate

ai-image-extension

openai-sora-2-text-to-video

ai-video-upscaler-pro

ai-object-eraser

veed-lipsync

veo3.1-fast-image-to-video

veo3.1-fast-text-to-video

ai-dance-effects

image-effects

gemini-omni-image-to-video

veo3-fast-text-to-video

ltx-2-fast-text-to-video

kling-v2.5-turbo-std-i2v

minimax-hailuo-2.3-pro-i2v

minimax-hailuo-2.3-pro-t2v

wan2.1-text-to-image

reve-image-edit

grok-imagine-text-to-video

nano-banana-pro-edit

qwen-image-edit-plus-lora

ai-image-face-swap

google-imagen4-fast

sdxl-lora

infinitetalk-image-to-video

wan2.2-edit-video

ltx-2-pro-text-to-video

mmaudio-v2-text-to-audio

kling-v2-avatar-pro

flux-2-flex

flux-2-pro-edit

ai-product-shot

seedance-v1.5-pro-t2v

bytedance-seededit-v3

add-video-watermark

ai-skin-enhancer

seedance-v1.5-pro-t2v-fast

qwen-image-edit-2511

qwen-text-to-image-2512

kling-v2.1-standard-i2v

kling-v3.0-standard-image-to-video

kling-v3.0-std-motion-control

suno-add-vocals

seedance-2-video-watermark-remover-pro

ai-background-remover

latent-sync

claude-opus-4-6

flux-kontext-dev-i2i

seedance-2-image-to-video-fast

pixverse-v5.5-t2v

wan2.7-video-edit

seedance-2-omni-reference-no-video

seedance-2-i2v-480p

suno-remix-music

seedance-2-vip-image-to-video-fast

happy-horse-1-text-to-video-1080p

veo3-image-to-video

flux-schnell

happy-horse-1-text-to-video-720p

kling-v2.1-pro-i2v

seedance-2-vip-image-to-video-1080p

seedance-2-vip-first-last-frame-1080p

kling-v3.0-4k-image-to-video

gemini-2-5-pro

wan2.2-text-to-video

vidu-v2.0-i2v

vidu-q3-turbo-text-to-video