- OpenAI has officially launched its first AI agent: Operator
- It works within a web browser to complete tasks for you and is now available as a limited research preview.
- The operator can make a dinner reservation, fill out a form, and complete other web tasks
OpenAI is always looking for the next big thing to add to ChatGPT, and after months of rumors, including a report earlier this week teasing a launch, the tech giant’s first AI agent is here. Operator is designed to complete web tasks for you, all with the touch of a button.
Basically, the operator is a computer-utilized agent (CUA) that uses GPT-4o’s visual abilities to navigate and search the web. This means it can understand the context of what you search for and, thanks to its multimodality, understands what you see while you search. It is available now as a research preview for ChatGPT Pro subscribers in the United States.
The Operator is described as “an agent that can use your own browser to perform tasks for you.” OpenAI released a demo showing the Operator browsing the web like we (i.e. humans) do. You can ask the Operator to book a dinner for you, fill out an arduously long form, order food from a service, or even book a flight. You can use OpenTable to find and book a restaurant reservation, as shown in the demo. The operator will even guide you through your steps.
Look
Operator is a ‘research breakthrough’, so know that it is in its infancy. OpenAI imposes some limitations. We haven’t had a chance to try it out yet, but it certainly looks impressive. This is OpenAI’s first entry into the world of AI agents, which will likely be the topic of the year in the field of artificial intelligence.
OpenAI writes in a blog post announcing Operator that “it is one of our first agents, which are AIs capable of doing work for you independently: you assign it a task and it will execute it.” This hints that not only are there other agents in the works (Altman confirmed this during the live demo) but that they are all based on the notion of doing things for you, a big step in the quest to make AI even more useful. giving us some time back.
The operator is powered by the new Computer Using Agent (CUA) model, which combines the vision abilities of GPT4o with advanced reasoning. All of this combines to allow the Operator to understand and use elements within a browser: the search bar, various buttons and on-screen content.
OpenAI explains that “the operator can ‘see’ (via screenshots) and ‘interact’ (using all actions that a mouse and keyboard allow) with a browser,” allowing them to functionally use a browser to complete a task. That’s pretty good, especially if it works with a high success rate and, according to the blog post, can self-correct.
However, as with most new AI tools and skills, it will likely be some time before this is actually useful in the real world. That will also require OpenAI to open it up to more people, although as a preview of the early research it’s still certainly an impressive demonstration.
For now, if you’re in the United States and subscribed to ChatGPT Pro, you can try it out on the OpenAI website. OpenAI CEO Sam Altman teased that it would eventually reach other countries and be added to the ChatGPT Plus subscription. As we remember from some of the OpenAI 12 Days announcements, Europe is likely to take a little longer.