Alongside the launch of the updated Claude 3.5 Sonnet model and the new Claude 3.5 Haiku model, Anthropic today announced an experimental public beta API called "computer use" that allows Claude to control PC screens and take actions on a user"s behalf. Through this API, developers can direct Claude to look at a screen, move the cursor, click buttons, and type text. Anthropic is releasing this API today to gather feedback and improve it rapidly over time.
The Anthropic team wrote the following about this new capability in their announcement blog post:
With computer use, we"re trying something fundamentally new. Instead of making specific tools to help Claude complete individual tasks, we"re teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people.
Developers can use the Claude computer API to automate repetitive processes, test applications, and even conduct tasks like research. Here"s how it works:
Developers can integrate this API to enable Claude to translate instructions (e.g., “use data from my computer and online to fill out this form”) into computer commands (e.g. check a spreadsheet; move the cursor to open a web browser; navigate to the relevant web pages; fill out a form with the data from those pages; and so on).
On the OSWorld AI evaluation benchmark, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category, which is almost double the next-best AI system"s score of 7.8%. When provided with more steps to complete the task, Claude scored an even more impressive 22.0%.
The Anthropic team clearly mentioned that Claude"s current ability to use computers is not yet complete since it can"t perform some common user actions like scrolling, dragging, and zooming. Since this new API may be used for spam, misinformation, or fraud, Anthropic has developed new classifiers to identify when computer use is being utilized and whether harm is occurring.
Anthropic highlighted that several startups, including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company, are already exploring the Сomputer use API to perform tasks that require even hundreds of steps to complete.