AI updates continue at a rapid pace. Just two days after OpenAI released GPT 5.3 instant for ChatGPT, the company announced an even bigger upgrade, GPT 5.4.
GPT 5.4 comes in two versions: Thinking and Pro. Both handle complex tasks, but Pro has extra capacity for the most demanding processes, delivering even higher performance than Thinking.
Both versions are accessible through OpenAI’s Paid API and Codex development tools. GPT 5.4 thinking is included for all paid ChatGPT subscribers, including those on the $20 per month or higher plan, and supports advanced user tasks. By contrast, GPT 5.4 Pro, which offers expanded computational allowances and support for more complex tasks, is available only to ChatGPT Pro users ($200 per month) and enterprise customers.
ChatGPT free-tier users may temporarily interact with GPT 5.4; however, this occurs only when user queries are programmatically routed to the upgraded model, according to an OpenAI spokesperson.
The main highlights of this release are efficiency and a new feature. OpenAI says GPT 5.4 uses up to 47% fewer tokens on some tasks relative to earlier models. Even more impressively, the new native computer use mode lets GPT 5.4 users control their computers and run multiple applications via the API and Codex.
OpenAI is also launching new ChatGPT integrations that let GPT 5.4 connect directly to Microsoft Excel and Google Sheets. This enables in-depth analysis and automated tasks, which could speed up business operations; however, it may also raise concerns about the right quality of job losses, especially after similar tools from Anthropic’s Claude and its Co-Work app.
According to OpenAI, GPT-5.4 can handle up to one million tokens of context in the API and Codex. This lets agents plan, execute, and verify tasks using a purpose model with built-in advanced AI capabilities in Codex and API. Let control agents control computers and compute multi-step tasks through different apps.
According to OpenAI, GPT-5.4 can autonomously generate control scripts for leveraging libraries such as play/write and accurately issue mouse and keyboard input events using screen capture data. The organization also cites significant improvements in performance metrics for automated web navigation processes.
OpenAI shares benchmark results to show that this is anything but a user interface layer.
In the BrowseCamp evaluation suite, which quantitatively assesses autonomous web navigation and the retrieval of obscured information, OpenAI reports that GPT-5.4 delivers a 17% absolute performance gain over GPT-5.2, achieving an 89.3% success rate, cited as the new performance benchmark.
For the OS World Verified Benchmark and Assessment of Desktop Environment Navigation through Visual and Peripheral Event Recognition, OpenAI states GPT-5.4 attained a 75% completion rate, up from 47.3% for GPT-5.2 and surpassing the referenced human performance average of 72.4%.
In the web arena, verified suite GPT-5.4 reports 67.3% aggregate success for DOM- and screenshot-driven commands, compared to 65.4% for its predecessor on the online main-2 web. OpenAI reports a 92.8% success rate for interactions in screenshot-mediated environments.
OpenAI correlates advanced computer use with enhanced visual recognition and document parsing capabilities. On the MMU Pro Benchmark, GPT 5.4 achieved an 81.2% completion rate absent auxiliary toolkits, compared to 79.5 for GPT 5.2. The model reportedly realizes this metric employing substantially fewer computational tokens per inference.
On the OmniDocBench test, GPT 5.4’s average error dropped from 0.109 to 0.140 for GPT 5.2. OpenAI also mentions better support for high-quality image inputs with detail levels up to 10.24 million pixels.
OpenAI positions GPT-5.4 as optimized for persistent multi-phase task automation. According to the model, it functions as an autonomous agent managing workflow state across consecutive operational steps rather than responding to isolated prompts.
Tool Search and Improve Tool Orchestration
OpenAI notes that adding every tool definition to prompts increases costs, slows performance, and clutters requests.
GPT-5.4 introduces tool search in the API. Instead of all tool definitions at once, it gets a list of lists and a search function that fetches full definitions as needed.
For 250 tasks on scale, MCP Atlas with 36 servers, the tools search cut token use by 47% while maintaining the same accuracy as showing all MCP functions.
The 47% reduction applies only to the tool search set up in that test. It does not mean GPT-5.4 always uses 47% fewer tokens for all tasks.
Improvements For Developers And Coding Workflows
GPT-5.4 combines the coding strengths of GPT-5.3 Codex with better tools and computer use capabilities for multi-step tasks.
GPT 5.4 matches or beats GPT 5.3 Codex on SWE Bench Pro and responds faster during thinking tasks.
Codex now offers more workflow controls with fast mode, providing up to 1.5x faster performance on supported models like GPT 5.4, delivering the same model and intelligence just faster.
OpenAI is releasing an experimental Codex scale playwright interactive for visual debugging and testing of web and Electron apps.
With GPT-5.4, OpenAI is launching a set of secure AI products for enterprises and financial institutions in ChatGPT. These tools use GPT-5.4 for advanced financial reasoning and Excel-based modeling.
The main feature is ChatGPT for Excel and Google Sheets, which brings ChatGPT into spreadsheets, allowing teams to build, analyze, and update complex financial models using familiar formulas.
The suite adds integrations for merging data from sources such as FactSet, MSCI, Third Bridge, and Moody’s, and introduces reusable skills for earnings reviews, comparable analysis, DCF analysis, and drafting investment memos.
- comparable analysis
- DCF analysis
- drafting investment memos
OpenAI supports its finance focus with an internal benchmark showing model effectiveness rose from 43.7% with GPT-5 to 88% with GPT-5.4 in an internal investment banking test.
Measuring AI Performance Against Professional Work
OpenAI Benchmarks Against Real Office Work on GDP While Covering 44 Jobs GPT-5.4 matches or beats professionals in 83% of cases vs. 71% for GPT-5.2. The company also points to improvements in areas where models frequently struggle, such as structured tables, formulas, clear writing, and design quality.
In an internal test of spreadsheet-modeled modeling tasks, similar to those done by junior investment banking analysts, GPT-5.4 scored an average of 87.5%, while GPT-5.2 scored 68.4%.
In presentation tests, OpenAI human raters chose GPT-5.4’s presentation 68% of the time over GPT-5.2’s. They noted better design, more visual variety, and improved image generation.
Improving Reliability and Reducing Hallucinations
OpenAI describes GPT-5.4 as its most factual model yet and links that claim to a practical dataset: de-identified prompts that users previously flagged as containing factual errors. On that set, OpenAI reports GPT-5.4’s individual claims are 33% less likely to be false, and its full responses are 18% less likely to contain any mistakes compared to GPT-5.2.
In comments to VentureBeat, early GPT-5.4 tester Daniel Swiecki from Walleye Capital said that on internal finance and Excel tests, GPT-5.4 boosted accuracy by 30 percentage points.
Brendan Foody, CEO of Mercor, calls GPT-5.4 the best model the company has tested and says it’s now the top performer in Mercor’s Apex Agents benchmark for professional services, with a focus on long-horizon deliverables such as slide decks, financial models, and legal analysis.
Pricing and Availability
OpenAI states that GPT-5.4 thinking and GPT-5.4 Pro are available via the API as GPT-5.4 and GPT-5.4, respectively. Pricing is based on usage. GPT-5.4 is built at $30 per 1 million input tokens and $180 per 1 million output tokens. Pricing for GPT-5.4 thinking is not listed.
It’s also important to know that with GPT 5.4, if your request goes over 227,000 input tokens, you will be charged double the usual rate. This is because the model now supports much larger prompts than before.
In Codex, the default compaction limit is 272,000 tokens. The higher long context pricing applies only if your input exceeds that amount. Developers can keep prompts at or below 272,000 tokens to avoid the higher rate or choose the compaction limit for larger prompts, which will be billed at the higher rate.
An OpenAI spokesperson said the API’s maximum output is 128,000 tokens, which is unchanged from earlier models.
The spokesperson said that GPT-5.4’s higher base rate stems from three main factors:
- It handles more complex tasks such as coding, computer use, deep research, advanced document generation, and tool use.
- It benefits from major research advances.
- It reasons more efficiently using fewer similar tokens for similar tasks.
They also said that even with the price increase, GPT-5.4 still costs less than other leading models.
The Wider Shift
With this release and the follow-up clarifications, GPT 5.4 is described as a model designed for more than just generating answers. It aims to support ongoing professional workflows that require tool coordination, computer interaction, long-term context, and outputs that align with what people use in real-world settings.
OpenAI’s focus on token efficiency, tool research, native computer use, and fewer user-reported factual errors aims to make agentic systems more practical for real-world use by lowering the cost of retries. Whether it’s a person re-prompting an agent using another tool or a workflow running again after a failed attempt, these improvements help make production systems more reliable.









