News
OpenAI upgrades 40% performance of GPT-5.2 API Models
OpenAI has rolled out a significant performance upgrade for its latest AI models, achieving a 40% reduction in latency for both GPT-5.2 and GPT-5.2-Codex. Latency refers to the time it takes for the model to begin generating a response after receiving a request. This update makes API responses noticeably faster for developers and users without altering the models’ core intelligence or capabilities.
The improvement was accomplished through backend inference optimizations and refinements in how the models process requests on OpenAI’s infrastructure. No changes were made to the model weights or training data, meaning the quality, reasoning power, and accuracy remain unchanged from their initial launch.
When GPT-5.2 was first released, OpenAI emphasized its advances in reasoning, coding, and multimodal understanding. It highlighted strong benchmark results, improved safety features, and broader capabilities compared to earlier models.

The API pricing, which has not changed with this update, is $20 per million input tokens and $60 per million output tokens for GPT-5.2, with similar rates for the Codex variant.
This speed boost arrives at a time of high demand. Recent decisions to offer free Codex access to all users, combined with a surge of new users, have put pressure on computing resources. The optimization helps maintain smooth performance across all tiers while delivering quicker results.
By partnering with hardware providers like Cerebras and leveraging advanced chips such as the WSE-3, OpenAI was able to achieve these gains efficiently. The result is a more responsive experience that keeps the models competitive against fast rivals like Claude 3.5 Sonnet.
Overall, the %40 jump in GPT-5.2 and GPT-5.2-Codex performance demonstrates OpenAI’s ongoing effort to make its tools faster and more reliable for developers.
(source)
