Operai showed his first AI agent, operator, last week, but already has a misguided competitor that offers a tool to call the browser that can complete online tasks for you. This computer use agent (CUA) can write, search, click, click and copy information on the websites without needing to touch the mouse or keyboard and without the chatgpt pro subscription from $ 200 to month.
The use of the browser is really free, at least if you are willing and capable of spending time playing with the API code. I am not very literate in code, but I naively thought that I knew enough of how Github works to use the API version. Hours of examining the documentation, adjusting the configuration and seeing examples later, I decided that this would need a deeper level of coding knowledge than the one I have, much less the average person who sails on the web.
Happily, for me, the use of the browser only debuted a cloud version used by the OpenAi GPT-4o model. Cut many of the great technical uprisings and speed up things in a more familiar chat format without any additional work. It has its limitations and costs $ 30, but after my inept mastion API, it felt like a bargain. And even in this way (obviously unfinished), it still needs to strive in the engineering indications and negotiate how AI works. The most limiting aspect is that you can only issue a message before having to start a new interaction. Despite the text box, you cannot respond to what AI does and refine your application.
Buy AI
With everything configured, I used the use of the browser through some real world tests. First it was a pricing task. I entered the notice: “Navigate to Amazon, Best Buy and Walmart and look for ‘Macbook Air M2’. Remove the name of the product, the price and availability of shares of the first five results in each site. Compare prices and identify the most Low one.
He did the job well, although he found no discount or hidden coupons. Even so, the fact that I could automate the monitoring of multiple sites was quite exciting. That said, a continuous problem for any agent like this occurs when a website wants to verify that it is human. The use of the browser has a button that allows you to take charge when you want, but it will also alert it when necessary. You can try your humanity and then hit the curriculum to let the AI take over.
Fly ai
Then came a travel planning task with the notice: “Look for a round trip from New York to London on December 15, 2025 in British Air. Select the cheapest option and remove the details, including the price, the price, the airline and the time of departure “.
Use of the browser delivered, raising a British Airways flight to $ 750, complete with the start time and other relevant details. This could be incredibly useful for people who reserve many trips, especially if you automate it to verify price drops regularly.
AI friend of good weather
Finally, I tried climate prediction and planning with the notice: “Verify the 7 -day time forecast for New York City at Weather.com and summarize temperature trends, rainfall and any warning of the severe climate and then suggest how to dress for it. . “
The climate is one of the most popular uses for voice attendees, so I wanted to see how AI handled a more complex application in that line. He did very well, not only extracting the information from the prognosis, but suggests what days to use a light layer and what days I must “isolate with a warm coat and scarf, since it will be cold with low rainfall.”
Journey
The key difference between the two is accessibility. The use of the browser is like a Swiss knife for developers. He has the flexibility of doing almost anything inside a browser, but he needs to know how to use the tools. You can dig in the code, adjust it and mold it to your exact needs. If a function is missing, nothing prevents you from adding it. The use of the browser, being open source, also has an active community of developers that constantly refines it. That means that if you find problems, there are forums and discussions of Github in which you can probably find answers.
The OpenI operator, on the other hand, is like hiring a butler. For a long time for you but within certain limitations. The strength of the operator is its integration with the broader AI ecosystem of OpenAI, which gives access to patented models that can make more nuanced decisions. However, it is locked in the OpenAI price structure and the limited customization options.
The use of the browser is not perfect. Even its cloud version demands some patience. You must prepare your indications carefully, prepare for problem solving and occasionally start over. The cloud version can compensate for some of this later, but for now, the limits of not being able to edit or respond within the conversation give difficult limits in their flexible nature.
And speed can also be frustrating. Watch a video of my second test; This is four times the speed of the real process.
At this time, the use of the browser is the most suitable for people who enjoy the touch -ups, such as developers, researchers and automation geeks who do not mind getting dirty hands. If you are willing to strive, you will get a powerful and flexible tool that costs much less than its competition.
But if you prefer not to spend your weekend fighting with configuration files, the operator may be the most indulgent option. Anyway, web automation is ready for a boom.