Simon Willison’s Weblog: Anthropic’s Prompt Engineering Interactive Tutorial

Source URL: https://simonwillison.net/2024/Aug/30/anthropic-prompt-engineering-interactive-tutorial/#atom-everything
Source: Simon Willison’s Weblog
Title: Anthropic’s Prompt Engineering Interactive Tutorial

Feedly Summary: Anthropic’s Prompt Engineering Interactive Tutorial
Anthropic continue their trend of offering the best documentation of any of the leading LLM vendors. This tutorial is delivered as a set of Jupyter notebooks – I used it as an excuse to try uvx like this:
git clone https://github.com/anthropics/courses
uvx –from jupyter-core jupyter notebook courses
This installed a working Jupyter system, started the server and launched my browser within a few seconds.
The first few chapters are pretty basic, demonstrating simple prompts run through the Anthropic API. I used %pip install anthropic instead of !pip install anthropic to make sure the package was installed in the correct virtual environment, then filed an issue and a PR.
One new-to-me trick: in the first chapter the tutorial suggests running this:
API_KEY = “your_api_key_here"
%store API_KEY
This stashes your Anthropic API key in the IPython store. In subsequent notebooks you can restore the API_KEY variable like this:
%store -r API_KEY
I poked around and on macOS those variables are stored in files of the same name in ~/.ipython/profile_default/db/autorestore.
Chapter 4: Separating Data and Instructions included some interesting notes on Claude’s support for content wrapped in XML-tag-style delimiters:

Note: While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you use specifically XML tags as separators for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance. We have purposefully made Claude very malleable and customizable this way.

Plus this note on the importance of avoiding typos, with a nod back to the problem of sandbagging where models match their intelligence and tone to that of their prompts:

This is an important lesson about prompting: small details matter! It’s always worth it to scrub your prompts for typos and grammatical errors. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it’s more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.

Chapter 5: Formatting Output and Speaking for Claude includes notes on one of Claude’s most interesting features: prefill, where you can tell it how to start its response:
client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=100,
messages=[
{"role": "user", "content": "JSON facts about cats"},
{"role": "assistant", "content": "{"}
]
)
Things start to get really interesting in Chapter 6: Precognition (Thinking Step by Step), which suggests using XML tags to help the model consider different arguments prior to generating a final answer:

Is this review sentiment positive or negative? First, write the best arguments for each side in and <negative-argument> XML tags, then answer.

The tags make it easy to strip out the "thinking out loud" portions of the response.
It also warns about Claude’s sensitivity to ordering. If you give Claude two options (e.g. for sentiment analysis):

In most situations (but not all, confusingly enough), Claude is more likely to choose the second of two options, possibly because in its training data from the web, second options were more likely to be correct.

This effect can be reduced using the thinking out loud / brainstorming prompting techniques.
A related tip is proposed in Chapter 8: Avoiding Hallucinations:

How do we fix this? Well, a great way to reduce hallucinations on long documents is to make Claude gather evidence first.
In this case, we tell Claude to first extract relevant quotes, then base its answer on those quotes. Telling Claude to do so here makes it correctly notice that the quote does not answer the question.

I really like the example prompt they provide here, for answering complex questions against a long document:

<question>What was Matterport’s subscriber base on the precise date of May 31, 2020?</question>
Please read the below document. Then, in <scratchpad> tags, pull the most relevant quote from the document and consider whether it answers the user’s question or whether it lacks sufficient detail. Then write a brief numerical answer in <answer> tags.

Via Hacker News
Tags: anthropic, claude, uv, ai, llms, prompt-engineering, python, generative-ai, jupyter

AI Summary and Description: Yes

Summary: The text provides a detailed exploration of Anthropic’s prompt engineering tutorial for LLMs (Large Language Models), specifically focusing on Claude. It highlights practical insights related to prompt crafting and management of API keys useful for AI and cloud security professionals working with generative AI technologies.

Detailed Description: The provided text discusses Anthropic’s interactive tutorial aimed at prompt engineering within their AI platform. This tutorial serves as both a practical guide and a resource for developers, enhancing their understanding of how to interact with Claude, an advanced LLM.

Key Points:

– **Documentation Quality**: Anthropic is noted for its exemplary documentation, which simplifies learning and encourages exploration through interactive examples using Jupyter notebooks.

– **API Key Management**:
– The tutorial teaches users how to securely store and retrieve their API keys within the IPython environment, emphasizing good practices in handling sensitive information.
– Using `%store API_KEY` allows developers to manage their credentials efficiently and securely in their code environment.

– **Effective Prompt Structuring**:
– The tutorial emphasizes the importance of organizing prompts using XML-style delimiters, which Claude recognizes effectively, thereby enhancing the model’s performance.
– Small mistakes in prompts, such as typos, can lead to significant differences in model outputs. This highlights the need for precision in prompt formulation.

– **Innovative Features**:
– The functionality of prefill options is introduced, allowing developers to guide the model’s response style. This showcases Claude’s versatility in adapting to user-defined tasks.

– **Advanced Techniques**:
– Techniques like “thinking out loud” are presented to enhance the model’s decision-making process, encouraging it to explore multiple arguments before arriving at a conclusion.
– Strategies to minimize “hallucinations” — inaccuracies in AI outputs — are outlined, particularly in handling extensive documents. For instance, extracting quotes as evidence before responding is proposed as a viable approach to increase accuracy.

Insights for AI and Cloud Security Professionals:
– Understanding how to interact effectively with LLMs like Claude not only refines AI outputs but also reduces risks associated with erroneous information dissemination.
– Integrating security best practices in coding environments (e.g., API key management) is essential for preventing unauthorized access to sensitive tools within cloud infrastructures.
– Developing a mindset focused on meticulous prompt crafting can significantly impact the reliability and performance of generative AI applications, an essential consideration in security implementations.

This tutorial equips professionals with both practical knowledge and strategic insights necessary to navigate the complexities of AI security within the cloud computing landscape.