OpenAI is moving from a commerce system to agentic AI using an operator and its computer with a CUA model. The operator can independently control a computer to complete multi-step tasks, enabling AI to interact with websites and apps on the user’s behalf.  

With this new paradigm in mind, consider the following outline of OpenAI’s vision for autonomous computer tasks.  

To provide better context, let’s start by focusing on the first key area:  

  • An operator is an AI agent designed to take control of a user’s web browser and eventually their computer to handle repetitive or complex tasks.  
  • The operator uses a computer running the agent CUA model, which combines GPT-4’s Visual Reasoning with Reinforcement Learning. Unlike older automation tools that require API interfaces, CUA can view the screen via screenshots and interact with graphical interfaces, as a person does.  
  • An operator can fill out forms, order groceries, do research, create memes, and schedule appointments.  

The Shift to Agentic AI 

  • Operator denotes a shift from a checkbox that only talks to agents who can take action. It is built to manage long, multi-step tasks with little need for people to step in.  
  • A new ChatGPT Agent feature lets AI use a virtual computer to check calendars, book restaurants, and make slide decks.  
  • Once you set a goal, agents work on their own. For example, it could plan a weekend trip.  

Present Constraints & Safety 

  • The operator is still in the research stage and is primarily available to pro users in the US.  
  • The AI pauses for human approval before any action that can be undone, like sending emails or deleting calendar events.  

The agent can sometimes get stuck on streaky interfaces, capture or password fields, so it may need help from a person.  

Future Outlook discusses the upcoming directions and possibilities for the Operator platform and agentic AI. 

  • OpenAI plans to expand the operator to the Plus team and Enterprise users.  
  • OpenAI positions Customer Agents as a Foundation for Progress Towards Artificial General Intelligence (AGI).  
  • The aim is to move from a single tool to an ecosystem in which agents work independently across multiple systems seven times.  

The move to agentic AI is part of a broader trend in 2025, with companies like Anthropic and Google building similar capabilities.  

At the start of this year, OpenAI CEO Sam Altman predicted 2025 would be pivotal for AI agents—tools that automate tasks and act on users’ behalf.  

Building on this vision, OpenAI is now making its first real move in this area.  

OpenAI has announced a research preview of Operator, an AI agent that controls a web browser and autonomously performs tasks. It will initially be available to US users with ChatGPT’s Pro subscription and will expand to Plus, Team, and Enterprise plans, with dates to be announced.  

Operators will be available in other countries soon, though a specific launch date has not been announced. OpenAI CEO Sam Altman said during a live stream on Thursday that Europe will, unfortunately, take a while.  

Currently, the research preview is at operator.chatgpt.com. OpenAI plans to add Operator to all ChatGPT clients soon. The operator promises to automate tasks such as booking travel, making reservations, and shopping. The interface offers categories such as shopping, delivery, dining, and travel for different automations.  

When users activate Operator in ChatGPT, a dedicated web browser opens, allowing the agent to complete tasks and explain its actions. Users still control their own screen, as Operator operates in its own browser.   

OpenAI explains the browser runs on a computer using an agent or CUA model, combining GPT-4o’s vision skills with advanced reasoning. The CUA attempts to interact directly with website interfaces, bypassing developer APIs.  

This allows the CUA to click, navigate menus, and fill forms on web pages much like a person.  

OpenAI says it’s collaborating with companies like DoorDash, eBay, Instacart, Priceline, StubHub, and Uber to ensure operators comply with their terms of service.  

The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, for example, before submitting an order or sending an email, so that the user can recheck the model’s work before it becomes permanent. Open-air rights in materials provided for death crimes. It has already proven useful in a variety of cases, and we aim to extend that dependability across a wider range of tasks.  

But OpenAI warns the CUA isn’t perfect. The company says it doesn’t expect the CUA to perform reliably in all scenarios just yet.  

Currently, the operator cannot consistently handle many complex or specialized tasks. OpenAI adds support for tasks such as creating detailed slide shows, overseeing intricate calendar systems, or interacting with highly customized or non-standard web interfaces.  

To be extra careful, OpenAI requires users to supervise certain tasks, such as banking transactions, even though the CUA and operator could handle them on their own. For example, users must enter credit card information themselves. OpenAI also says the operator does not color or screenshot any data.  

On particularly sensitive websites, such as email, the operator requires active user supervision, guaranteeing users can directly catch and handle any potential mistakes the model might make, OpenAI says in its support materials.  

This does limit what an operator can do, but it also helps prevent mistakes like the agent accidentally spending your mortgage payment on edgy accent chairs. Google has taken a similar approach with its Project Marina AI agent, which also avoids entering sensitive information such as credit card numbers.  

Limitations 

The operator does have some important limitations.  

There are both daily and task-based rate limits. OpenAI says an operator can handle seven tasks at once, but there are dynamic limits on how many. There is also an AU total-usage limit that resets each day.  

For security reasons, the operator will not perform certain tasks at this stage, such as sending emails or deleting calendar events, even though the CUA can. OpenAI says this may change in the future, but there is no timeline yet.  

An operator can also get stuck if it encounters a complex interface, a password field, or a captcha. When this happens, it will prompt the user to take over.  

An Agentic Future 

Compared with competitors like Rabbit, Google, and Android, OpenAI has taken longer to develop an AI agent. This may be due to the technology’s safety risks.  

When an AI system can take actions on the web, it opens the door to much more dangerous use cases from nefarious actors. You could automate AI agents who orchestrate phishing scams or DDoS attacks or have them snatch up tickets to a concert before anyone else can. Especially for a tool as widely used as ChatGPT, it’s important that OpenAI takes steps to prepare for such exploits.  

OpenAI believes the operator is safe enough to release now, at least as a result review.  

The operator employs tools that seek to limit the model’s susceptibility to malicious prompts, consent instructions, and prompt injection. OpenAI explains on its website that a monitoring system triggers action if suspicious activity is detected, while automated and human-reviewed pipelines continuously update safety balances.  

Operator is OpenAI’s most ambitious effort so far to create an AI agent. Lastly, OpenAI launched Tasks, which gave ChatGPT basic automation features such as creating reminders and scheduling prompts to run at specific times each day. Tasks added some familiar but important features to ChatGPT, making it as practical as Siri or Alexa. However, the Operator introduces capabilities that earlier virtual assistants could not offer in AI. After ChatGPT, a new technology that will change how people use the internet and their PCs. Instead of simply delivering and processing information, agents can, in theory, take actions and actually do things.  

Now that OpenAI has released its first real AI agent, we will soon see how realistic this vision actually is.

Source: OpenAI launches Operator, an AI agent that performs tasks autonomously 

GPT 5.4 is our most advanced model so far. It enables faster, more accurate results in the API and Codex, helping people and teams make better decisions, increase productivity, and streamline processes.  

In most cases, GPT-5.4 is the default choice for general tasks and coding, chosen to simplify complex workflows, save time on software engineering, enhance reasoning, improve writing quality, and open tools, all with one model.  

This article presents the standard features of the GPT-5 models and shows practical ways to make the most of GPT-5.4.  

Key Improvements 

GPT 5.4 offers several improvements over the previous GPT 5.2 model:  

  • Experience sharper coding, better document understanding, smarter audio, and more reliable instruction following.  
  • Enhanced image perception lets users analyze visuals more accurately. It also helps manage multimodal workflows more easily.  
  • Users can complete long-running tasks faster than before. They can also execute multi-step agent workflows more reliably.  
  • More efficient token use reduces costs and improves end-to-end performance for heavy tool-based workloads.  
  • Faster, smarter web search uncovers hard-to-find information, saving time and simplifying research.  
  • Streamlining the handling of many documents or spreadsheets boosts productivity across customer service, analytics, and finance workflows.  

Developers produce production-ready code and polished interfaces faster and more consistently, with fewer prompts for refinement.  

For agent-based tasks, GPT 5.4 completes multi-step processes faster. It often uses fewer tokens and tool calls. This makes agent-based approaches more responsive and reduces the cost of operating complex workflows at scale in API and Codex.  

New Features in GPT 5.4 

Like its predecessors, GPT 5.4 offers flexible tool options, control over explanation detail, and curated tool lists. Now enjoy new features that make building agent systems easier, help manage more information, and ensure reliable automation.  

  • With the API tool search, you can seamlessly browse tools across vast ecosystems. Only what you need. Work smarter with fewer tokens and on-point choices. Discover more in the tool search guide.  
  • 1M token context window: GPT‑5.4 can handle up to 1M tokens. This makes it easier to analyze entire codebases and large sets of documents, or to run agent processes in a single request. You can read more in the “1M context window” section.  
  • Interact directly with software for the first time. Agents can now complete, check, and fix tasks faster in a complete build, run, and verify. Check out the computer use guide for more.  
  • Power through longer processes. Keep vital content thanks to GPT-5.4’s native compaction support.  

Meet the Models 

For most tasks and coding, GPT-5.4 is your new go-to model. It now replaces GPT-5.2. GPT-5.4 Codex and ChatGPT users get GPT-5 chat (latest) by default. Need better answers? GPT-5.4 Pro Raw hardness offers extra compute for data-fest challenges.  

Prefer a compact model try GPT-5 Mini for streamlined performance.  

Ready to choose and weigh these trade-offs to find your perfect match:  

Variant  Best for  
GPT 5.4  General purpose work including complex reasoning, broad word knowledge and code-heavy or Code heavy multistep agentic tasks  
Gpt 5.4 Pro  Tough problems that may take longer to solve and need deeper reasoning  
GPT 5 mini  Cost-optimized reasoning and chat; balancesspeed, cost, and capability.  
GPT 5 nano  High-throughput tasks, especially straightforward instruction-following or classification  

Lower Reasoning Effort 

The reasoning effort setting determines how many reasoning tokens the model uses before responding. Older models like O3 only offered low, medium, and high options. “No” meant faster, less thoughtful responses, while I meant longer, more reasoned answers.  

From GPT 5.2 on, the lowest setting is called NUM, which enables faster responses. This is now the default in GPT 5.2 and later; to increase model reasoning, raise the setting to medium and observe the changes.  

When reasoning effort is set to none, prompts become more important. For better reasoning, even at the default setting, ask the model to think or list its steps before answering.  

Verbosity 

Verbosity controls how many output tokens the model produces. Fewer tokens make responses quicker. Reasoning style remains mostly unchanged, but responses will be briefer, which can end or hurt depending on your needs. Use high verbosity for detailed explanations or major code changes. Use low for brief answers or simple code.  

  • High verbosity is useful for detailed document explanations or major code refactoring.  
  • Low verbosity is best for short answers or simple code, such as SQL queries. GPT-5 supports high, medium, and low settings. In GPT 5.4, you can still adjust verbosity, with medium as the default.  

With GPT 5.4, medium and high robustness produce longer, more organized code with explanations. Semicolon, more,e generates shorter code with little extra commentary.  

GPT 5.4 is designed to solve problems by reasoning through them.  

Models like GPT-5.4 solve problems step by step. They create an internal chain of thought as their reasoning; for best results, send these reasoning steps back to the model. This prevents the same reasoning from being repeated and keeps the conversation aligned with the model’s training. In conversations with multiple turns, using previous_response_id will automatically include earlier reasoning steps. This is especially useful when using tools. For example, if a function call needs another wrong group, you can use the previous_response_id, or alternatively, add the reasoning steps directly to the input.

SourceUsing GPT-5.4