Your πAI

OpenAI has rolled out GPT-5.3-Codex, a new flagship coding model designed to combine its strongest programming and reasoning capabilities into a faster, more capable system.

The company said early versions of GPT-5.3-Codex were already used internally to detect bugs in training runs, manage model rollouts, and analyze evaluation results — making the model a core part of OpenAI’s own development and deployment loop.

On agentic coding benchmarks, Codex topped SWE-Bench Pro and Terminal-Bench 2.0, outperforming Anthropic’s Opus 4.6 by roughly 12% on the latter just minutes after release. On OSWorld, a benchmark testing AI control of desktop environments, the model scored 64.7%, nearly double the 38.2% achieved by the prior Codex generation.

OpenAI also flagged GPT-5.3-Codex as its first model to receive a “High” cybersecurity risk rating, committing $10 million in API credits to fund defensive security research in response to the model’s increased autonomy and power.

Why it matters: Frontier labs are now openly using AI models to help build, test, and improve their own successors. As Anthropic has similarly noted Claude’s role in designing future models, the competitive frontier is shifting from surface-level features to recursive self-improvement — a far more consequential battleground for the next phase of AI development.

We Value Your Feedback

OpenAI Rolls Out GPT-5.3-Codex, Unifying Frontier Reasoning and Coding