OpenAI GPT-5.4 API Adds Tool Search, Cuts Token Latency

The GPT-5.4 API introduces tool_search to reduce token usage and speed up agent-based workflows.

Key Benefits

Instead of loading every tool definition in the starting prompt which can require thousands of tokens, the model now searches for and loads only what it needs at runtime. In some tests, this reduced total token usage by 47%.

Lower latency: With fewer input tokens, the API processes request faster, allowing agents to respond more quickly and efficiently.

Improve efficiency: tool_search manages large tool sets without overloading the model’s context window.

These enhancements are part of a broader set of updates in GPT-5.4. Next, let’s look at recent product expansions and the pace of new releases.

AI updates are arriving rapidly. Two days after OpenAI launched GPT-5.3 Instant, it announced an even larger upgrade: GPT-5.4.

GPT-5.4 comes in two versions:

GPT-5.4 Thinking, intended for a wide range of tasks

GPT-5.4 Pro is crafted for the most complex and advanced tasks, meeting higher performance demands and specialized needs. It includes expanded features and capacity for users with greater requirements.

Both versions are available via OpenAI’s Paid API and Codex Development Tools. GPT-5.4 thinking is accessible to all paid ChatGPT subscribers, including those on the $20 per month Plus Plan and above. GPT-5.4 PRO is exclusive to ChatGPT Pro users ($200 per month) and Enterprise Custom, supporting especially demanding or large-scale applications.

ChatGPT free users will sometimes experience GPT 5.4, but only when their queries are automatically routed to it, according to an OpenAI spokesperson.

The main highlights of this release are efficiency and a new feature: OpenAI’s GPT-5.4 uses up to 47% fewer tokens on some tasks relative to earlier models. Even more notable: the new native computer use mode lets GPT-5.4 control a user’s computer and run multiple applications via the API and Codex.

OpenAI is also launching ChatGPT, new ChatGPT integrations that let GPT-5.4 connect directly to Microsoft Excel and, soon, Google Sheets. This will enable in-depth analysis and automated tasks, potentially speeding up the business operations. However, it may also increase concerns that it might cause job losses, especially after similar tools from Anthropic’s Claude and its CoWork App.

According to OpenAI, GPT-5.4 can handle up to 1 million tokens of context in the API and Codex. This allows agents to plan, carry out, and check tasks over long periods. However, once the input exceeds 272,000 tokens, the cost per 1 million tokens doubles.

Native Computer Use: A Step Toward Autonomous Workflows

The most consequential capability is that GPT-5.4 is OpenAI’s first general-purpose model with built-in advanced computer-use abilities in Codex and the API. This lets agents run multiple multi-step tasks across different application codes via libraries like Playwright and issue mouse and keyboard commands in response to screenshots. OpenAI also claims a jump in agentic web browsing.

OpenAI provides benchmark results that show this feature is more than just a usual interface layer.

On the browser comp test, which checks how well AI agents can keep searching the web for hard-to-find information, OpenAI says GPT-5.4 improved by 17% over GPT-5.4 Pro. Waste is 89.3%, which OpenAI calls a new state of the art.

On OSWOLD, the OSWOLD verified test, which measures desktop navigation using screenshots and keyboard or mouse actions. OpenAI reports GPT-5.4 achieved a 75.0% success rate. This is up from 47.3% for GPT-5.2 and exceeds the reported human performance of 72.4%. Any verified GPT-5.4 achieves 67.3% success with both DOM- and screenshot-driven interaction, compared to 65.4% for GPT-5.2 on online Mind2Web. OpenAI reports 92.8% success using screenshot-based observations alone.

OpenAI also links computer use to better vision and document handling. On the MMMU Pro test, GPT-5.4 reached 81.2% success without using extra tools, compared to 79.5% for GPT-5.2. OpenAI says it did this using far fewer thinking topics. The reported error is 0.109, down from 0.140 for GPT-5.2. The post also describes expanded support for high-quality image inputs, including an original detail level up to 10.24M pixels.

OpenAI describes GPT-5.4 as designed for longer multi-step workflows. This means it acts more like an agent that tracks progress across multiple actions rather than just answering one question at a time, as a typical chatbot does.

Tool Search and Improve Tool Orchestration

OpenAI notes that adding every tool definition to the prompt increases cost, slows responses, and clutters context.

GPT-5.4 introduces tool search in the API as a structural fix. Instead, GPT-5.4 adds tool search to the API as a solution rather than returning both definitions at once. The model now gets a short list of tools and a search feature. It only loads full tool details when needed.

On the Scales MCP Atlas Benchmark (36 MCP servers), tool search reduced token usage by 47% while maintaining the same accuracy as exposing all functions directly in context.

The 47% reduction only applies to the tool search set up in the test. It does not mean that GPT-5.4 always uses 47% fewer tokens per task.

Improvements For Developers And Coding Workflows

OpenAI says GPT-5.4 builds on GPT-5.3 Codex, enabling more efficient code and better multi-step task handling for developers.

GPT-5.4 matches or outperforms GPT-5.3 Codex on SWE Bench Pro, delivering faster and more reliable performance on complex coding tasks.

Codex boosts workflow control. Fast mode can increase GPT-5.4 speeds by up to 1.5x, accelerating tasks without losing capability.

OpenAI is introducing an experimental Codex skill called Playwright (interactive). This tool demonstrates the integration of coding with computer use, allowing users to visually debug web and Electron applications and test apps at the command line.

OpenAI for Microsoft Excel and Google Sheets

With GPT-5.4, OpenAI launches secure AI tools in ChatGPT for businesses, enabling advanced, accurate financial modeling and reasoning within familiar platforms.

ChatGPT for Excel and Google Sheets (coming soon). Let users seamlessly build, analyze, and update complex financial models directly within spreadsheets, increasing efficiency and accessibility.

The suite also introduces new ChatGPT app integrations, consolidating market, company, and internal data into a single workflow. OpenAI sites, FactSet, MSCI, Third Bridge, and Moody’s are examples.

OpenAI is also adding reusable skills for common finance tasks, such as:

Earnings previews

Comparable analysis

DCF analysis

Drafting investment memos

OpenAI supports its finance focus with an internal benchmark showing model results improved from 43.7% with GPT-5 to 88.0% with GPT-5.4 on its investment banking test.

Measuring AI Performance Against Professional Work

OpenAI uses benchmarks designed to resemble real office work rather than puzzles on GDP, which assesses knowledge work across 44 jobs. OpenAI reports that GPT 5.4 matches or outperforms industry professionals in 83% of cases, compared to 71% for GPT 5.2.

OpenAI underscores improvements in structured tables, formulas, clear writing, and design quality, helping users overcome AI workflow challenges.

In an internal test of spreadsheet modeling tasks similar to those performed by junior investment banking analysts, GPT 5.4 achieved an average score of 87.5%, while GPT 5.2 scored 68.4%.

On a set of presentation evaluation prompts, OpenAI reports that human raters favored GPT-5.4’s presentations 68.0% of the time over those from GPT-5.2, attributing this to a preference for stronger aesthetics, greater visual variety, and more effective image generation.

Improved reliability and reduced hallucinations

OpenAI describes GPT-5.4 as its most factual model yet and links that claim to a practical data set: de-identified forms that users previously flagged as containing factual errors. OpenAI reports GPT-5.4’s individual claims are 33% less likely to be false, and its full responses are 18% less likely to contain any errors. In a comment to venture-only early GPT-5.4 tester Daniel Sweiki from Walleye Capital, it was said that GPT-5.4 boosted accuracy by 30 percentage points on internal finance and Excel sheets. He credits this to better automation for model updates and scenario analysis.

Brandon Foody, CEO of Mercor, says GPT-5.4 is the best model his company has used. He adds that it now needs Mercor’s Apex Agents benchmark for professional services, especially for assignments such as slide decks, financial models, and legal analysis.

The Wider Shift

With its release and follow-up clarifications, GPT-5.4 is presented as a model designed to do more than just generate answers. It aims to assist ongoing professional tasks that need tool coordination, computer use, a longer context, and output that matches what people use in their jobs.

OpenAI’s focus on the Token Efficiency tool search, native computer use, and fewer user-reported errors is to make agent-based systems more practical for everyday use by lowering the cost of reads/writes. Whether it’s a person re-prompting an agent using another tool or a workflow running again after a failed attempt, these improvements help make the technology more reliable.

Source: OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets