Question 1

How many reference images can I provide?

Accepted Answer

Exactly one image is required and supported. The image must be a publicly accessible URL and no larger than 20 MB.

Question 2

Where do the audio_ids come from?

Accepted Answer

Voice profile IDs are created using the Gemini Omni Audio endpoint. Call that endpoint first to register a voice, then pass the returned `audio_id` here.

Question 3

What does the output image show?

Accepted Answer

The image URL returned in `outputs` is the generated character image — a visual representation of the character derived from your reference photo and description.