Question 1

Why do I have to record twice?

Accepted Answer

The second recording is a liveness check. By making you read a fresh random phrase that you could not have prepared in advance, the provider verifies you are the actual speaker — not someone uploading a YouTube clip of a celebrity or a coworker. This is enforced upstream and cannot be skipped.

Question 2

What audio format and length should the sample be?

Accepted Answer

10 seconds of clean speech or singing is ideal. Mono or stereo, mp3 or wav both work. Use vocal_start_s and vocal_end_s to mark the vocal segment within a longer file if needed.

Question 3

How long does a voice stay usable?

Accepted Answer

Voices created by the provider have a limited validity window. Call the `check` endpoint before each music generation, and the `refresh` endpoint to get a new verification phrase if a voice has expired.

Question 4

Can I use the same voice_id across multiple songs?

Accepted Answer

Yes — that is the whole point. Once cloning succeeds, the voice_id is yours to reuse in any music generation request until it expires.

Question 5

What does it cost?

Accepted Answer

Free during preview. Pricing may change once the upstream provider begins charging.