ModelsGemini Omni

Gemini Omni API

5 variants

Natively multimodal any-to-any model. The Gemini Omni API on Muapi delivers text-to-video, image-to-video, and video-edit with synchronized audio generated in the same forward pass — plus custom voice profiles and character profiles for consistent character-driven generation.

⚡ Gemini Omni FlashNatively multimodal · any-to-any · synchronized audio · voice & character profiles5 models
T2VNew
⚡ Flash

Gemini Omni Text to Video

Natively multimodal any-to-any model. Generate cinematic video with synchronized dialogue, ambient audio, and music from a single text prompt — all in one forward pass.

Up to 4K
from $1.50
Try Model
I2VNew
⚡ Flash

Gemini Omni Image to Video

Animate up to 5 reference images with a text prompt. Gemini Omni preserves subject identity across frames and generates synchronized audio natively in the same forward pass.

Up to 4K
from $1.50
Try Model
V2VNew
⚡ Flash

Gemini Omni Video Edit

Source-driven video editing with the Gemini Omni any-to-any model. Restyle, relight, swap subjects, or rewrite dialogue while preserving original motion and timing.

Up to 4K
from $2.40
Try Model
AudioNew
⚡ Flash

Gemini Omni Audio

Create a reusable voice profile from any of 30 preset voices. Describe timbre and style, then pass the returned audioId in audio_ids when generating Gemini Omni video.

Voice profile
Free
Try Model
CharNew
⚡ Flash

Gemini Omni Character

Create a reusable character profile from one reference image. Describe the character, optionally attach voice profiles, and use the character_id in character_ids for consistent character-driven video generation.

Character profile
Free
Try Model

Why use the Gemini Omni API on Muapi?

Any-to-any in one pass

Text, image, audio, and video reasoned together — no chained pipelines, no cross-model drift.

Native synchronized audio

Dialogue, ambient sound, and music generated in the same forward pass as the visuals.

T2V, I2V, and V2V

Generate from text, animate up to 5 reference images, or edit existing clips — three modes, one API surface.

Custom voice profiles

Create reusable voice profiles from 30 preset voices. Attach up to 3 per generation via audio_ids.

Character-consistent generation

Create character profiles from a reference image. Reuse the characterId across generations for a consistent visual identity.

Drop-in for Veo or Sora workflows

Same submit-then-poll pattern as every other Muapi model — swap the endpoint and ship.

Gemini Omni API — Frequently Asked Questions

What is the Gemini Omni API?

Gemini Omni is a natively multimodal any-to-any model. The Gemini Omni API on Muapi exposes text-to-video, image-to-video, and video-edit capabilities with synchronized audio generated in the same forward pass.

How is Gemini Omni different from Veo or Sora?

Gemini Omni reasons across text, image, audio, and video in one forward pass instead of relaying through specialized models. The result is native synchronized audio, fewer cross-modality artifacts, and cleaner edits than a chained pipeline can produce.

How much does the Gemini Omni API cost on Muapi?

Text-to-video and image-to-video are priced by duration and resolution: from $1.50 for an 8-second 720p clip up to $2.70 for 8 s at 4K. Video-edit is a flat $2.40 (720p/1080p) or $3.60 (4K) per generation. Synchronized audio is included at no extra charge.

What inputs and outputs does Gemini Omni support?

Text-to-video takes a text prompt. Image-to-video takes up to 5 reference images plus a prompt. Video-edit takes a source clip and a prompt describing the edit to apply. All three produce video with natively synchronized audio. You can also attach up to 3 preset voice IDs (audio_ids) and up to 3 character IDs (character_ids) to any generation.

What is Gemini Omni Audio?

Gemini Omni Audio lets you create a reusable voice profile by picking one of 30 preset voices, giving it a name, and optionally describing the timbre, pacing, and style. The API returns an audioId that you pass in the audio_ids field when generating Gemini Omni video — up to 3 voice profiles per generation.

What is Gemini Omni Character?

Gemini Omni Character creates a reusable character profile from a single reference image and a text description. The returned characterId can be passed in the character_ids field (up to 3 per generation) to anchor the character's visual identity across multiple video generations.

Can I generate vertical or square video with Gemini Omni?

Yes — all three Gemini Omni variants support 16:9 and 9:16 aspect ratios, so the same API powers cinematic widescreen and TikTok-style vertical clips.

Is Gemini Omni available on all plans?

Gemini Omni is available on the Pro and Business plans. Upgrade at muapi.ai/topup to get access.