News
xAI releases Grok 4 Fast, its cost-efficient models with 2 million token context window
xAI has launched Grok 4 Fast, its new cost-efficient reasoning models. It offers a massive token context window to provide more freedom while interacting with AI.
This new model outperforms Grok 3 Mini on all reasoning benchmarks with the highest token cost efficiency. This is the result of large-scale reinforcement learning, which optimizes Grok 4 Fast’s intelligence density. According to xAI’s evaluation, Grok 4 has used 40% fewer thinking tokens on average compared to Grok 4, while providing the same performance as Grok 4, as tested on benchmarks.
This token efficiency and lower price per token allow the AI maker to reduce the price by up to 98% to achieve the same performance as Grok 4.
This new model can seamlessly browse the internet and X (formerly Twitter) to augment queries with real-time data. It searches through links, online media, including the ones that are available on X, and combines the results as fast as possible. This capability comes in handy when asked for coding or web-related queries.
Unified Model
Unlike previous models, Grok 4 Fast comes with a unified architecture where reasoning and non-reasoning are handled by the same model weights. xAI explains that this unified approach reduces end-to-end latency and cuts token costs.
Users can now access the Grok 4 Fast via the Grok web, Android, and iOS applications. The Auto mode will now choose this new model for difficult queries for a fast experience without degrading response quality. Interestingly, this model is now available for all Grok users, including the free ones. And for a limited time, the new model will also be available for free on OpenRouter and Vercel AI Gateway.
API
xAI has released two models – grok-4-fast-reasoning and grok-4-fast-non-reasoning. These two have a 2-million token context window. These two models are now available on the xAI API console.
(source)
