OpenAI Releases New Codex Model Capable of Working for Over 24 Hours

OpenAI Introduces GPT-5.1-Codex-Max for Extended Coding Work

OpenAI has unveiled GPT-5.1-Codex-Max, a new advanced coding model built for long-duration software development tasks, now available across all Codex platforms.

The model is built on an upgraded reasoning architecture and trained on agent-style tasks spanning software engineering, mathematics, research, and more. It is the company’s first system designed to operate across multiple context ranges using a method known as compaction, allowing it to stay coherent across millions of tokens within a single session.

According to OpenAI, the model can function independently for hours. Internal evaluations showed Codex-Max “continuously refined its work, fixed failing tests, and ultimately delivered a working solution” for tasks running over 24 hours.

The model is now accessible to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise tiers. Developers using the Codex CLI with an API key will gain access once API support goes live. GPT-5.1-Codex-Max now replaces GPT-5.1-Codex as the standard across all Codex tools.

OpenAI reported that 95% of its engineering staff use Codex weekly and that the team “ships around 70% more pull requests since adopting Codex.”

Improved Accuracy and Greater Token Efficiency

GPT-5.1-Codex-Max surpasses earlier versions on multiple real-world and benchmark coding tests. On SWE-Lancer, it achieved 79.9% accuracy, compared to 66.3% for GPT-5.1-Codex. On SWE-bench Verified, it delivered higher accuracy at the same reasoning level while using 30% fewer thinking tokens.

OpenAI noted that these gains translate into lower costs for developers. For instance, the model produced a complete browser-based CartPole reinforcement learning sandbox using 27,000 thinking tokens, down from 37,000 in the previous Codex version.

The company is also introducing a new extra-high reasoning mode for tasks that are not time-sensitive, allowing the model more time to think before generating results.

Extended Workflows and Windows Integration

Thanks to compaction, GPT-5.1-Codex-Max can manage deep refactoring, hours-long debugging, and lengthy agent loops that previously hit context-size limits. It is also the first Codex model trained to work within Windows environments, supported by new tasks aimed at improving collaboration inside the Codex CLI.

Security Controls and Cyber Defense Measures

OpenAI stated that GPT-5.1-Codex-Max “does not reach High capability in Cybersecurity” per its Preparedness Framework, but it is still the strongest cybersecurity-oriented model the company has launched so far.

The company added that it is developing further protections as autonomous capabilities advance, emphasizing that it has already intervened in attempts to misuse its models in digital operations.

Codex operates inside a restricted sandbox by default, with limited file access and no network connectivity unless manually enabled. OpenAI advises maintaining these restrictions to reduce the risk of prompt-injection exploits.

The company also emphasized that “Codex should be used as an additional reviewer, not a substitute for human oversight,” urging developers to inspect all generated code before deploying it.

Also Read:

Luma Raises $900 Million Series C to Power 2GW Supercluster in Saudi Arabia

Musk’s xAI Partners with HUMAIN to Establish Saudi Arabia’s AI Supercomputing Network