OpenAI Operator SDK v1.2 Adds Kernel-Level Mouse Control

An Operator is an AI agent called a Computer Using Agent (CUA) that completes tasks by controlling a computer via its screen, mouse, and keyboard, automating browser tasks for users.

Below are some important details about the operator release:

Availability: Currently, ChatGPT Pro is offered to subscribers for $200 a month.

Functionality: The column operator uses GPT-4 OS vision to interact with computer interfaces.

Future Scope: OpenAI plans to expand Operator to the Plus team and enterprise users and integrate it into ChatGPT.

The current research preview focuses on browser-based actions, aiming to let AI use computers as a human would.

The operator is a web-based agent that navigates the internet and completes tasks for users. It operates within its own browser environment, allowing it to view web pages and interact by tapping, clicking, and scrolling. Currently in a research preview phase, Operator has certain limitations that are expected to be addressed with further user feedback. As one of OpenAI’s first agents, Operator enables users to delegate tasks, which it then executes autonomously.

An operator can manage repetitive browser tasks on behalf of users, such as filling out forms, ordering groceries, or generating memes. Because it interacts with websites and tools in the same way users do, Operator enhances the practicality of AI. It streamlines routine activities and creates new opportunities for businesses to engage customers.

We are starting with a small roll-out for safety and manageability. Pro users in the US can access Operator at operator.chatgpt.com. This limited release helps us learn from users and improve Operator over time.

How Operator Works

The operator runs on a new model called Computer Using Agent (CUA). CUA commands GPT for those vision skills, using advanced reasoning and reinforcement learning. It’s trained to work with graphical user interfaces, such as buttons, menus, and text fields you see on your screen.

The Operator sees what is on the screen by taking screenshots. It interacts with the browser using all mouse and keyboard actions. This means it works on the web without needing special API interfaces.

If Operator runs into problems or makes a mistake, it uses its reasoning skills to try to fix things on its own. If it can’t resolve the issue, it gives control back to you, ensuring the experience remains smooth and coordinated.

CUA is still new and has some limitations, but it has already set new records in important browser benchmarks. More details about our evaluations and the research behind Operator are on our blog post.

How to Use

To start, tell the Operator what to do, and it will handle the rest. You can take control of the browser at any time. The Operator asks you to step in for tasks that need a login, payment, or when a captcha appears.

You can personalize Operator with custom instructions for all or specific sites. For example, you might set airline preferences on booking.com. The operator also lets you save points for quick access. This is useful for frequent tasks like restocking groceries on Instacart. Using multiple tabs, the Operator can handle several tasks at once by starting new conversations, like ordering a mug from Etsy while booking a campsite on Hipcamp.

Ecosystem & Users

Operator changes AI from a passive tool into an active helper in the digital world. It makes tasks easier for users and helps companies offer better experiences and improve conversion rates. We’re working with companies like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbstack, and Uber, and others to ensure Operator meets real needs and follows industry standards. We also see many ways operators can make certain workflows easier to use and more effective, especially in the public sector. For example, we are partnering with the city of Stockton to help people enroll in city services and programs more easily.

By initially introducing Operator to a select audience, OpenAI aims to learn and refine its capabilities through real-world feedback, while maintaining a focus on innovation, trust, and safety. This approach supports meaningful value delivery to users, creators, businesses, and public sector organizations.

Safety and Privacy

Ensuring the operator is safe to use remains our top priority. We have added three layers of safeguards to prevent abuse and keep users in control.

Operator keeps users in control by prompting for input at key moments.

Takeover Mode: When sensitive information like passwords or payment details must be entered, the Operator prompts you to take over. In this mode, the operator does not collect or record any input.

User confirmations: before completing actions such as placing an order or sending an email. The operator requests your approval.

Task Limitations: Operator declines certain sensitive tasks, such as banking transactions or job application decisions.

Watch Mode: On sensitive sites, such as email and financial services, the Operator operates under close supervision. This lets you promptly identify and correct any issues.

Data privacy and management within Operator is designed to be straightforward.

Training Opt-Out: If you turn off “Improve the model for everyone” in your ChatGPT settings, your data in Operator will not be used to train our models.

Transparent Data Management: You can delete all browsing data and log out of every site with one click. In Operator’s settings, you can also easily delete past conversations.

We have added protections to stop websites from manipulating Operator with hidden prompts, malicious code, or phishing attempts.

Cautious Navigation: The operator can detect and ignore prompting actions.

Monitoring: A dedicated monitor detects suspicious behavior and can pause tasks if necessary.

The detection pipeline uses both automated systems and human reviewers to spot new threats. We update safeguards quickly. Operator is built to refuse harmful requests and block disallowed content. Our moderation can warn users or revoke access if rules are broken. Extra review steps help catch misuse. We provide guidance on using Operator in line with policies.

Even with safeguards, no system is perfect, and Operator is under research review. We will improve it with feedback and testing. To learn more, visit the Operator Research blogs’ safety section.

Limitations

The operator is currently in an early research phase, and while it’s already capable of handling a wide range of tasks, it’s still learning and evolving and may make mistakes. For instance, it currently struggles with complex interfaces, such as creating slide shows or managing calendars. Early user feedback will play a vital role in upgrading its accuracy, reliability, and safety, helping us make Operator better for everyone.

What’s Next?

Cua in the API: The model behind Operator, called Cua, will soon become available via the API, enabling developers to build their own CAD computer using agents.

Enhanced capabilities will keep working to help the Operator handle longer, more detailed workflows.

Access: We plan to expand Operator to the plus team and enterprise users, and to integrate its capabilities directly into ChatGPT in the future, once we are certain of its safety and usability at scale, unlocking seamless, real-time, and asynchronous task execution.

Source:Introducing Operator