Simon Willison’s Weblog: Anthropic Release Notes: System Prompts - Cloud Security Alliance News Clipping Site

Source URL: https://simonwillison.net/2024/Aug/26/anthropic-system-prompts/#atom-everything
Source: Simon Willison’s Weblog
Title: Anthropic Release Notes: System Prompts

Feedly Summary: Anthropic Release Notes: System Prompts
Anthropic now publish the system prompts for their user-facing chat-based LLM systems – Claude 3 Haiku, Claude 3 Opus and Claude 3.5 Sonnet – as part of their documentation, with a promise to update this to reflect future changes.
Currently covers just the initial release of the prompts, each of which is dated July 12th 2024.
Anthropic researcher Amanda Askell broke down their system prompt in detail back in March 2024. These new releases are a much appreciated extension of that transparency.
These prompts are always fascinating to read, because they can act a little bit like documentation that the provider’s never thought to publish elsewhere.
There are lots of interesting details in the Claude 3.5 Sonnet system prompt. Here’s how they handle controversial topics:

If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information. It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.

Here’s chain of thought “think step by step" processing baked into the system prompt itself:

When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.

Claude’s face blindness is also part of the prompt, which makes me wonder if the API-accessed models might more capable of working with faces than I had previously thought:

Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. […] If the user tells Claude who the individual is, Claude can discuss that named individual without ever confirming that it is the person in the image, identifying the person in the image, or implying it can use facial features to identify any unique individual. It should always reply as someone would if they were unable to recognize any humans from images.

It’s always fun to see parts of these prompts that clearly hint at annoying behavior in the base model that they’ve tried to correct!

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like “Certainly!”, “Of course!”, “Absolutely!”, “Great!”, “Sure!”, etc. Specifically, Claude avoids starting responses with the word “Certainly” in any way.

Anthropic note that these prompts are for their user-facing products only – they aren’t used by the Claude models when accessed via their API.
Via @alexalbert__
Tags: prompt-engineering, anthropic, claude, generative-ai, ai, llms

AI Summary and Description: Yes

Summary: The text discusses the release notes for Anthropic’s chat-based language models (LLMs) and details their system prompts. This transparency is significant for developers and security professionals, as it outlines how the models handle controversial topics, process information, and engage with users. Understanding these aspects can enhance the oversight of AI and generative AI security.

Detailed Description:
The release notes from Anthropic highlight several critical points about their language models, particularly Claude 3 and its variants. The notable aspects include:

– **Enhanced Transparency**: Anthropic has started to publish the system prompts used in their user-facing chat-based LLMs. This effort aims to provide insight into the operational parameters of their models, which can improve transparency in AI deployment.

– **Handling Controversial Topics**:
– The models are programmed to assist with tasks reflecting various viewpoints without pushing their own opinions.
– When confronting sensitive or controversial subjects, Claude presents the information clearly while refraining from labeling the topic as sensitive or making claims of objectivity.

– **Step-by-Step Processing**:
– The models implement a “chain of thought” approach, allowing Claude to methodically work through problems, such as math or logic challenges, before offering solutions. This is a desirable feature for users needing reliability and accuracy.

– **Mitigating Bias in Recognition**:
– Claude is designed to operate under “face blindness,” meaning it does not identify individuals from images. This approach minimizes potential privacy issues, as the model avoids recognizing or implying knowledge about human faces, protecting users’ confidentiality.

– **Direct Communication Style**:
– Claude eschews unnecessary affirmations or filler phrases, leading to more straightforward and efficient interactions, enhancing the user experience.

– **Separation of User-Facing Products and APIs**:
– The prompts described are specific to user-facing products and do not apply to API-accessed models, which may open the door to different functionalities when accessing the LLMs through APIs.

In summary, this detailed breakdown by Anthropic offers crucial insights regarding the operational protocols of their LLMs, which are fundamental for AI security, ethics, and compliance professionals. By understanding these elements, stakeholders can better assess risks, ensure compliance with ethical standards, and enhance the security measures associated with deploying generative AI technologies.