Google DeepMind Introduces Gemini 2.5 Computer Use Model

Written by: Mane Sachin

Published on:

Follow Us

Google DeepMind has introduced Gemini 2.5 Computer Use, a specialized version of Gemini 2.5 Pro. This new tool is built to interact directly with user interfaces, enabling it to complete tasks on websites and apps by performing actions like clicking, scrolling, and filling out forms. It can even function behind login screens, marking a significant step toward more capable digital assistants.

Currently available in preview through the Gemini API via Google AI Studio and Vertex AI Studio, this release allows developers to build intelligent systems that can understand and operate within complex digital environments. The goal is to simplify task automation for users by letting these systems handle routine interactions on their behalf.

One of the key features of Gemini 2.5 Computer Use is its ability to manipulate dynamic interface elements such as dropdown menus, filters, and interactive content. DeepMind highlighted this as a necessary evolution toward building more flexible and useful digital agents that can work across a wide range of web-based tools.

Developers can utilize a “computer-use tool” to drive interactions in a loop. Each cycle begins with input from the user, a screenshot of the current digital environment, and a history of recent actions. The system then generates an action — such as a click or text entry — which is executed by client-side code. This loop repeats until the task is either completed or intentionally stopped.

Although primarily optimized for web environments, the technology is also showing promise on mobile platforms. However, it has not yet been developed to manage tasks on desktop operating systems. In live demos, the system successfully transferred pet care information into a customer management platform and organized digital notes into sorted categories.

Gemini 2.5 Computer Use has performed well in multiple benchmarks focused on digital control, including Online-Mind2Web, WebVoyager, and AndroidWorld. DeepMind reports that the tool delivers high accuracy — over 70% — and maintains low delay, with task completion averaging around 225 seconds.

DeepMind also emphasized the importance of safety when deploying systems that control software environments. Risks include unintended actions, misuse, or falling prey to deceptive web elements. Because of this, safety has been built into the core functionality, with tools in place to monitor and restrict what the system can do.

To give developers more control, DeepMind has made it possible to set rules requiring the system to either decline certain tasks or ask for user approval before proceeding. These safeguards are designed to reduce the risk of harm while maintaining the power and flexibility of the tool.

Also Read:

Wall Street analysts explain how AMD’s own stock will pay for OpenAI’s billions in chip purchases

Cadence Teams Up with TSMC for Advanced AI, Chip Design

Mane Sachin

My name is Sachin Mane, and I’m the founder and writer of AI Hub Blog. I’m passionate about exploring the latest AI news, trends, and innovations in Artificial Intelligence, Machine Learning, Robotics, and digital technology. Through AI Hub Blog, I aim to provide readers with valuable insights on the most recent AI tools, advancements, and developments.

For Feedback - aihubblog@gmail.com