GPT-5.3 Codex vs GPT-5.2 Instant

We are excited to introduce GPT-5.3 Codex, our most advanced agentic coding model yet. It combines the top coding performance of GPT-5.2 Codex with the reasoning and professional knowledge of GPT-5.2, all in a model that runs 25% faster. This means it can handle long tasks like research and tool use, as well as complex projects. You can guide and interact with GPT-5.3 Codex as it works, just like you would with a colleague, and it keeps track of the context throughout.

GPT-5.3 Codex is our first model instrumental in its own creation. The Codex team used early versions to debug its own training, manage its own deployment, and analyze test results and evaluations. Our team was blown away by how much Codex accelerated its own development.

With GPT-5.3 codecs, move beyond just writing and reviewing code. Now it can handle almost anything developers and professionals do on a computer.

Frontier Agentic Capabilities

GPT-5.3 Codex sets new records on SWE Bench Pro and Terminal Bench, and also performs well on OS World and GDPval. These four benchmarks help us measure coding agent tech and everyday skills.

Coding

GPT-5.3 Codex delivers top results on SWE Bench Pro, a tough test for actual software engineering. Unlike SWE Bench, which covers only Python, SWE Bench Pro includes four languages and is more challenging and industry relevant. GPT-5.3 Codex also beats the previous best on Terminal Bench 2.0, which tests terminal skills. It achieves all this using fewer tokens than any earlier model, so that users can do even more.

Web Development

With its improved coding skills, better design, and efficient resource use, GPT-5.3 Codex can build complex games and apps from scratch in just a few days to test its web development and long-running abilities. We asked it to create two games:

An updated racing game from the Codex app launch

A new diving game using its web game development skills and straightforward follow-up prompts like “fix the bug” or “improve the game.”

GPT-5.3 Codex worked on the games by itself over millions of tokens. You can watch the trailers and try the games to see what Codex can do.

GPT-5.3 Codex is also better at understanding what you want when you ask it to create everyday websites compared to GPT-5.2 Codex. Even with simple or vague prompts, it now builds sites with more features and smart defaults, giving you a better starting point for your ideas.

For example, we asked GPT-5.3 Codex and GPT-5.2 Codex to build two landing pages. GPT-5.3 Codex automatically displayed the yearly plan as a discounted monthly price, making the discount feel clear and intentional. Rather than multiplying the yearly total, it also created an automatically transitioning testimonial carousel with three distinct user quotes, resulting in a page that feels more complete and production-ready by default.

Beyond Coding

Software engineers, designers, product managers, and data scientists do much more than write code. GPT-5.3 Codex is designed to support every part of the software lifecycle, including debugging, deploying, monitoring, and writing PRDs. Its capabilities also go beyond software, so you can use it to create slide decks or analyze data within spreadsheets.

Using custom skills like those from our earlier GDPval results, GPT-5.3 Codex also performs well on professional knowledge work, matching GPT-5.2. GDP Well is an evaluation released by OpenAI in 2025 that measures how well a model handles specific knowledge work tasks across 44 jobs. These tasks include making presentations, creating spreadsheets, and producing other work products.

These results across coding, front-end, computer use, and real-life tasks show that GPT-5.3 Codex is not just better at individual tasks. It represents a major move toward a single, general-purpose agent that can reason, build, and execute across a wide range of technical work.

An Interactive Collaborator

As models become more powerful, the main challenge is making it easy for people to interact with many agents simultaneously. The Codex app helps you manage and guide agents more easily, and with GPT-5.3 Codex, it is now more interactive. The new model gives frequent updates so you can keep informed about key decisions and progress. Instead of waiting for a final result, you can interact in real time by asking questions, discussing approaches, and guiding the solution. Codex explains what it is doing, responds to feedback, and keeps you updated from start to finish.

How We Used Codex To Train And Deploy Gpt-5.3 Codex

Recent improvements to Codex are the result of research projects at OpenAI and have taken months or even years to develop. Codex is speeding up these projects, and many researchers and engineers at OpenAI say their work now feels very different compared to just two months ago. Even the early versions of GPT-5.3 Codex showed strong abilities, which helped our team improve training and support the launch of later versions. Codex is useful for a wide range of tasks, making it difficult to enumerate all the ways it helps our teams. As some examples, the research team used Codex to monitor and debug the training run for this release. It accelerated research beyond debugging infrastructure problems:

It helped monitor column patterns throughout training.

Provided a deep analysis of interaction quality.

Proposed fixes and built rich applications for human researchers to precisely understand how the model’s behavior differed from prior models.

The engineering team used Codex to optimize and adapt the harness for GPT-5.3 Codex. When we started seeing strange edge cases affecting users, team members used Codex to identify context-rendering bugs and the root cause of low cache hit rates. GPT-5.3 Codex is continuing to help the team throughout the launch by dynamically scaling graphics processing unit clusters to adjust to traffic surges and keeping latency stable.

During Alpha testing, a researcher wanted to see how much extra work GPT-5.3 Codex completed per turn and how it affected productivity. GPT-5.3 Codex completed simple regex classifiers to measure how often clarifications were needed to track user responses and monitor task progress. It then ran these checks across session logs and produced a report with its findings. People using Codex were happier because the agent better understood their intent and made more progress each term with fewer explanatory questions.

Because GPT-5.3 Codex is so different from earlier versions, the alpha testing data showed many unusual and unexpected results. A data scientist on the team worked with GPT-5.3 Codex to build new data pipelines and create better visualizations than our usual dashboard tools. Together, they quickly analyzed the results, and Codex summarized key insights from thousands of data points in less than three minutes. These tasks are interesting examples of how Codex can help researchers and produce builders. Taken together, we found that these new capabilities accelerated our research, engineering, and product teams.

Securing the Cyber Frontier

In recent months, we have seen real improvements in model effectiveness on cybersecurity tasks, which helps both developers and security professionals. At the same time, we have been working on stronger cyber safeguards to support defensive use and make the ecosystem more resilient.

GPT-5.3 Codex is the first model we classify as having high capability for cybersecurity-related tasks under our preparedness framework, and the first we’ve directly trained to identify software vulnerabilities. While we don’t have conclusive evidence that it can automate end-to-end cyber-attacks, we are taking a precautionary approach and deploying our most extensive cybersecurity safety stack to date. Our mitigations include:

Safety training

Automated monitoring

Trusted access for sophisticated capabilities

Enforcement pipelines, including threat intelligence

Since cybersecurity can be used for both good and bad purposes, we are using an evidence-based, step-by-step approach. This helps defenders find and fix vulnerabilities faster while making misuse harder. As part of this, we are launching Trusted Access for Cyber, a pilot program to speed up cyber defense research.

To help prevent misuse, some requests that our systems see as higher cyber risks may be automatically sent from GPT-5.3 Codex to GPT-5.2. We are still improving these safeguards. Developers doing security research or who think their requests are classified can apply for full access through our trusted access program, or report the issue using the feedback command.

We are investing in ecosystem safeguards by expanding the private beta of Aardvark, our security research agent. This is the first product in our Codex Security Suite. We are also working with open-source maintainers to offer free codebase scanning for popular projects like Next.js, where a security researcher recently used Codex to find and disclose vulnerabilities.

Building on our 1 million Cyber Security grant program launched in 2023, we are also committing $10 million in API credits to accelerate cyber defense with our most capable models, especially for open-source software and critical infrastructure systems. Organizations engaged in good faith security research can apply for API credits and support through our Cyber Security grant program.

Availability & Details

GPT-5.3 is available with paid ChatGPT plans. Wherever you use Codex: the app, CLI, IDE extension, and web, we are working to enable API access safely in the near future.

With this update, we are also running GPT-5.3 Codex 25% faster for Codex users, thanks to improvements in our infrastructure and inference stack, resulting in faster exchanges and faster results.

GPT-5.3 Codex was designed and trained on NVIDIA GB200/NVL72 systems. We thank NVIDIA for its partnership.

Source: Introducing GPT‑5.3‑Codex