Question 1

How is pricing calculated?

Accepted Answer

Pricing is token-based: $1.25 per million input tokens and $9.00 per million output tokens regardless of which Codex model variant you use. The minimum charge per call is $0.00023.

Question 2

Which Codex variant should I use?

Accepted Answer

gpt-5.4-codex is the default and most capable variant. Use lower variants (gpt-5.3-codex, gpt-5.2-codex, etc.) if you need faster responses for simpler tasks. All variants share the same pricing.

Question 3

What is the difference between /gpt-codex and /gpt-codex/stream?

Accepted Answer

/gpt-codex is async — you receive a request_id and poll for the result. /gpt-codex/stream returns a live SSE stream. Use streaming for interactive coding UIs; use the async endpoint for batch processing.

Question 4

Does this model support Prompt Caching?

Accepted Answer

Yes. Prompt Caching allows you to reuse frequently used text prompts at reduced rates. Cache hits (reusing previously cached tokens) are charged at 0.1x of the input cost. New cache creation (writing new tokens to cache for future reuse) is charged at 1.25x of the standard input cost.