Google's AI can now surf the w... Note
VentureBeat

Google's AI can now surf the web for you, click on buttons, and fill out forms with Gemini 2.5 Computer Use

Google's DeepMind has released Gemini 2.5 Pro Computer Use, an AI model designed to act as a virtual agent on the web. This new model can navigate websites, fill forms, and perform actions on behalf of users, similar to offerings from OpenAI and Anthropic. Google CEO Sundar Pichai highlighted its importance in developing general-purpose AI agents. While not directly available to consumers, it's accessible through the Browserbase platform and the Gemini API for developers. The model builds on Gemini 2.5 Pro's capabilities, with a focus on interacting with user interfaces. It allows AI systems to operate visually and functionally, unlike API-dependent models. Early tests show success in navigating websites and completing tasks, though it lacks direct file system access of competitors. Google claims Gemini 2.5 Computer Use leads in interface control benchmarks and offers lower latency. The model operates in an interaction loop, analyzing screenshots and user prompts to recommend actions. Safety measures include per-step inspection and developer-defined instructions. It supports various UI actions like clicking and typing, with normalized screen coordinates. Pricing is similar to Gemini 2.5 Pro, but Computer Use is exclusively a paid tier offering. Data from paid tier usage does not improve Google products, unlike the free tier of Gemini 2.5 Pro.
CdXz5zHNQW_IjVm0ltTV5.png