Simon Willison’s Weblog: Initial explorations of Anthropic’s new Computer Use capability - Cloud Security Alliance News Clipping Site

Source URL: https://simonwillison.net/2024/Oct/22/computer-use/#atom-everything
Source: Simon Willison’s Weblog
Title: Initial explorations of Anthropic’s new Computer Use capability

Feedly Summary: Two big announcements from Anthropic today: a new Claude 3.5 Sonnet model and a new API mode that they are calling computer use.
(They also pre-announced Haiku 3.5, but that’s not available yet so I’m ignoring it until I can try it out myself.)
Computer use is really interesting. Here’s what I’ve figured out about it so far.

You provide the computer
Coordinate support is a new capability
Things to try
Prompt injection and other potential misuse

You provide the computer
Unlike OpenAI’s Code Interpreter mode, Anthropic are not providing hosted virtual machine computers for the model to interact with. You call the Claude models as usual, sending it both text and screenshots of the current state of the computer you have tasked it with controlling. It sends back commands about what you should do next.
The quickest way to get started is to use the new anthropic-quickstarts/computer-use-demo repository. Anthropic released that this morning and it provides a one-liner Docker command which spins up an Ubuntu 22.04 container preconfigured with a bunch of software and a VNC server.
export ANTHROPIC_API_KEY=%your_api_key%
docker run \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $HOME/.anthropic:/home/computeruse/.anthropic \
-p 5900:5900 \
-p 8501:8501 \
-p 6080:6080 \
-p 8080:8080 \
-it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
I’ve tried this and it works exactly as advertised. It starts the container with a web server listening on http://localhost:8080/ – visiting that in a browser provides a web UI for chatting with the model and a large WebVNC panel showing you exactly what is going on.
I tried this prompt and it worked first time:

Navigate to http://simonwillison.net and search for pelicans

This has very obvious safety and security concerns, which Anthropic warn about with a big red “Caution" box in both new API documentation and the computer-use-demo README, which includes a specific callout about the threat of prompt injection:

In some circumstances, Claude will follow commands found in content even if it conflicts with the user’s instructions. For example, Claude instructions on webpages or contained in images may override instructions or cause Claude to make mistakes. We suggest taking precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection.

Coordinate support is a new capability
The most important of these relates to screenshots and coordinates. Previous Anthropic (and OpenAI) models have been unable to provide coordinates on a screenshot – which means they can’t reliably tell you to "mouse click at point xx,yy".
The new Claude 3.5 Sonnet model can now do this: you can pass it a screenshot and get back specific coordinates of points within that screenshot.
I previously wrote about Google Gemini’s support for returning bounding boxes – it looks like the new Anthropic model may hae caught up to that capability.
The Anthropic-defined tools documentation helps show how that new coordinate capability is being used. They include a new pre-defined computer_20241022 tool which acts on the following instructions (I love that Anthropic are sharing these):
Use a mouse and keyboard to interact with a computer, and take screenshots.
* This is an interface to a desktop GUI. You do not have access to a terminal or applications menu. You must click on desktop icons to start applications.
* Some applications may take time to start or process actions, so you may need to wait and take successive screenshots to see the results of your actions. E.g. if you click on Firefox and a window doesn’t open, try taking another screenshot.
* The screen’s resolution is x.
* The display number is
* Whenever you intend to move the cursor to click on an element like an icon, you should consult a screenshot to determine the coordinates of the element before moving the cursor.
* If you tried clicking on a program or link but it failed to load, even after waiting, try adjusting your cursor position so that the tip of the cursor visually falls on the element that you want to click.
* Make sure to click any buttons, links, icons, etc with the cursor tip in the center of the element. Don’t click boxes on their edges unless asked.

Anthropic also note that:

We do not recommend sending screenshots in resolutions above XGA/WXGA to avoid issues related to image resizing.

I looked those up in the code:XGA is 1024×768, WXGA is 1280×800.
Things to try
I’ve only just scratched the surface of what the new computer use demo can do. So far I’ve had it:

Compile and run hello world in C (it has gcc already so this just worked)
Then compile and run a Mandelbrot C program
Install ffmpeg – it can use apt-get install to add Ubuntu packages it is missing
Use my https://datasette.simonwillison.net/ interface to run count queries against my blog’s database
Attempt and fail to solve this Sudoku puzzle – Claude is terrible at Sudoku!

Prompt injection and other potential misuse
Anthropic have further details in their post on Developing a computer use model, including this note about the importance of coordinate support:

When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place. Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands—similar to how models often struggle with simple-seeming questions like “how many A’s in the word ‘banana’?”.

And another note about prompt injection:

In this spirit, our Trust & Safety teams have conducted extensive analysis of our new computer-use models to identify potential vulnerabilities. One concern they’ve identified is “prompt injection”—a type of cyberattack where malicious instructions are fed to an AI model, causing it to either override its prior directions or perform unintended actions that deviate from the user’s original intent. Since Claude can interpret screenshots from computers connected to the internet, it’s possible that it may be exposed to content that includes prompt injection attacks.

Plus a note that they’re particularly concerned about potential misuse regarding the upcoming US election:

Given the upcoming U.S. elections, we’re on high alert for attempted misuses that could be perceived as undermining public trust in electoral processes. While computer use is not sufficiently advanced or capable of operating at a scale that would present heightened risks relative to existing capabilities, we’ve put in place measures to monitor when Claude is asked to engage in election-related activity, as well as systems for nudging Claude away from activities like generating and posting content on social media, registering web domains, or interacting with government websites.

Tags: ai, prompt-engineering, prompt-injection, generative-ai, llms, anthropic, claude, claude-3-5-sonnet, ai-agents

AI Summary and Description: Yes

Summary: Anthropic has introduced significant updates with the Claude 3.5 Sonnet model, including a new API mode termed “computer use.” This development allows Claude to interact with a user’s computer by analyzing screenshots and executing commands, raising notable safety and security concerns, particularly surrounding prompt injection vulnerabilities.

Detailed Description:
Anthropic’s release of the Claude 3.5 Sonnet model and its associated capabilities presents both innovative features and potential security risks. Here are the main highlights:

* **New Model Introduction**:
– **Claude 3.5 Sonnet**: A significant upgrade in capabilities allowing enhanced interaction with the user’s computer.
– **API Mode – Computer Use**: This enables Claude to control the user’s machine through commands based on screenshot inputs.

* **User-Provided Environment**:
– Users are responsible for supplying the computer environment; unlike OpenAI’s Code Interpreter, there is no hosted VM for Claude to engage with.
– Commands are issued based on real-time screenshots provided by the user.

* **Coordinate Support**:
– The model’s ability to analyze screenshots and extract coordinates marks a substantial advancement, enabling more precise control over GUI interactions.

* **Deployment**:
– Simple Docker command provided for spinning up a preconfigured Ubuntu container with necessary software and a VNC server for easy access.

* **Functionality**:
– Users can test various commands, including compiling basic programs, installing software, and interacting with web services.

* **Security Concerns**:
– A highlighted risk includes **prompt injection**, where external malicious instructions could manipulate Claude’s actions, potentially leading to harmful outcomes.
– Anthropic’s Trust & Safety teams are actively involved in analyzing model vulnerabilities related to prompt injection and developing strategies to mitigate such risks.

* **Precautionary Measures**:
– Users are advised to limit the resolutions of screenshots sent to the model to prevent issues stemming from image resizing.
– Anticipating potential misuse, particularly regarding the upcoming U.S. elections, Anthropic has implemented monitoring and mitigation strategies for any activity related to electoral processes.

In conclusion, while the advancements of the Claude 3.5 Sonnet indicate significant innovations in AI capabilities, the associated risks, particularly around prompt injection, require a cautious approach, especially for applications in sensitive contexts. This insight is crucial for security professionals in understanding the implications of employing generative AI models in operational environments.