Explore/muapi.ai/vidu-q2-reference-to-image

muapi/vidu-q2-reference-to-image

Image to Image

VIDU Reference-to-Image Q2 generates new high-quality images based on one or more reference images. It preserves the key identity, structure, or style of the reference while creating a new scene, variation, or enhanced composition. Ideal for character consistency, object re-interpretation, stylized redesigns, and cinematic recreations guided by reference inputs.

Input

Configure the model parameters below.

0/7 items
Drag & drop images here or paste file/image

Result

Generated output

🚀Related Models

View all
vidu-q2-turbo-image-to-video

vidu-q2-turbo-image-to-video

Vidu Q2 Turbo Image-to-Video animates a starting image into a fast, prompt-guided clip while preserving subject identity. Built for speed and cost efficiency.

Image to Video
vidu-q2-reference

vidu-q2-reference

Vidu Q2 Reference Video generates breathtaking cinematic clips from text prompts guided by multiple reference images. Each image refines the model’s understanding of subject, environment, and visual tone — ensuring perfect consistency in appearance and motion across every frame.

Image to Video
vidu-q2-turbo-text-to-video

vidu-q2-turbo-text-to-video

Vidu Q2 Turbo Text-to-Video is the fast, affordable Q2 tier for prompt-only generation. Use it for storyboards, social cuts, and high-volume work where speed and cost matter.

Text to Video
vidu-q2-pro-text-to-video

vidu-q2-pro-text-to-video

Vidu Q2 Pro Text-to-Video generates cinematic, prompt-faithful clips from text alone with strong temporal consistency and rich detail at up to 1080p. Pick this when you need polished output without a reference frame.

Text to Video
vidu-q2-turbo-start-end-video

vidu-q2-turbo-start-end-video

Vidu Q2 Turbo Start–End Video creates highly detailed cinematic sequences by interpolating between two visual states — your start frame and end frame. Built for story moments, cinematic transformations, product reveals, and artistic transitions, it captures smooth motion, realistic lighting shifts, and dynamic camera movements while maintaining fidelity and emotional tone.

Image to Video
vidu-q2-pro-start-end-video

vidu-q2-pro-start-end-video

Vidu Q2 Pro Start–End Video is a professional-grade model built for cinematic transformation storytelling. It evolves a scene, subject, or concept from one moment to another through smooth visual interpolation, natural lighting transitions, and dynamic motion.

Image to Video
vidu-q2-text-to-image

vidu-q2-text-to-image

VIDU Text-to-Image Q2 is a high-quality generative model focused on producing vivid, dynamic, and cinematic still images using natural language prompts. It excels at atmospheric depth, expressive lighting, surreal concepts, and motion-infused compositions typical of VIDU’s visual identity.

Text to Image
vidu-q2-pro-image-to-video

vidu-q2-pro-image-to-video

Vidu Q2 Pro Image-to-Video animates a single starting image into a smooth, prompt-guided clip up to 1080p while preserving subject identity, lighting, and composition.

Image to Video
📝

Overview

About this model

VIDU Reference-to-Image Q2 is an advanced image-to-image generation model designed to transform one or more reference images into entirely new compositions. Leveraging state-of-the-art AI and deep learning techniques, this model maintains the key identity, structure, and style of input images while creating enhanced variations, fresh scenes, or cinematic recreations. The underlying technology ensures that intricate details and artistic nuances from the reference images are preserved, making it highly effective for applications that require consistency and creativity.

In addition to its robust technical foundation, VIDU Reference-to-Image Q2 offers significant marketing advantages. Its ability to generate high-quality and stylistically coherent images from multiple inputs supports diverse use cases, from character consistency in entertainment to object reinterpretation in product design. Its competitive cost of $0.032 per generation further reinforces its appeal, providing superior value without compromising on output quality.

1Generate cinematic recreations of popular scenes with enhanced details.
2Reinterpret characters or objects while preserving their essential traits for animation or gaming.
3Create consistent visual assets for branding and marketing purposes.
4Design stylized and artistic variations of existing images for creative projects.
5Develop detailed visual concepts based on storyboard references in film production.
💰

Pricing & Value

Cost analysis

muapiapp$0.032 per generation

muapiapp is 20-50% more affordable than competitors while delivering comparable or superior output quality.

Fal.ai$0.045 per generation

Fal.ai charges close to this price point, making muapiapp a cost-efficient alternative with similar high-quality results.

Replicate$0.045 per generation

Replicate's pricing is nearly identical to Fal.ai, meaning muapiapp offers a 20-50% cost saving compared to these providers.

* Competitor pricing is estimated based on similar model architectures and usage tiers.

⚙️

Technical Details

Configuration schema

Promptstring

Text prompt describing the image.

Default ValueCreate a new scene where the masked wanderer stands inside an ancient stone observatory illuminated by rotating celestial beams; preserve the character’s clothing style and silhouette while adding glowing runes carved into the walls, mist swirling across the floor, and a dramatic cosmic light shaft from above; cinematic composition, high detail.
Image URLsarray

Upload or provide reference images. Used for image-to-image generation.

Default Valuehttps://d3adwkbyhxyrtq.cloudfront.net/webassets/videomodels/vidu-q2-reference-to-image-in.jpg
Aspect RatioEnum (9 options)

Aspect ratio of the output image.

Default Value1:1
ResolutionEnum (3 options)

The target resolution of the generated image.

Default Value1k
📖

Implementation Guide

Developer documentation

How to Use VIDU Reference-to-Image Q2

  1. Prepare Your Inputs

    • Gather the reference images that capture the key identity or style you want to preserve.
    • Craft a detailed text prompt that describes the new scene or desired variation. Ensure to include specific artistic directions if needed.
  2. Configure the Technical Settings

    • Prompt: Enter a clear and descriptive prompt.
    • Images List: Provide one or more image URLs as your reference inputs (maximum of 7 images allowed).
    • Aspect Ratio: Select the desired aspect ratio (e.g., 1:1, 16:9, etc.) to fit your project needs.
    • Resolution: Choose the output resolution (1k, 2k, or 4k) based on your quality requirements.
  3. Initiate Generation

    • Submit your inputs through the provided endpoint (vidu-q2-reference-to-image).
    • Wait for the model to process and generate new image variations while preserving core visual elements of the references.
  4. Review and Interpret Results

    • Once the images are generated, review the outputs to ensure they meet your visual expectations.
    • Use the resulting images directly or as a basis for further editing in your creative workflow.

Common Questions

Frequently asked

How does VIDU Reference-to-Image Q2 maintain consistency with reference images?

The model uses advanced deep learning techniques to capture and preserve key characteristics such as structure and style from the reference inputs, ensuring that the generated image reflects the core identity of the original.

What types of prompts work best with this model?

Detailed and descriptive text prompts yield the best results. Including specific directions regarding style, composition, and desired variations helps the model align the final output with your vision.

Are there any limitations on the number of reference images?

Yes, you can provide up to 7 reference images. This allows you to combine multiple sources of inspiration while ensuring the model can process them effectively.

What output resolutions are available?

You can choose from 1k, 2k, or 4k resolutions. The selection depends on your project's quality requirements and output medium.