Simon Willison’s Weblog: SQL injection-like attack on LLMs with special tokens - Cloud Security Alliance News Clipping Site

Source URL: https://simonwillison.net/2024/Aug/20/sql-injection-like-attack-on-llms-with-special-tokens/#atom-everything
Source: Simon Willison’s Weblog
Title: SQL injection-like attack on LLMs with special tokens

Feedly Summary: SQL injection-like attack on LLMs with special tokens
Andrej Karpathy explains something that’s been confusing me for the best part of a year:

The decision by LLM tokenizers to parse special tokens in the input string (~~, <|endoftext|>, etc.), while convenient looking, leads to footguns at best and LLM security vulnerabilities at worst, equivalent to SQL injection attacks.~~

LLMs frequently expect you to feed them text that is templated like this:
<|user|>\nCan you introduce yourself<|end|>\n<|assistant|>

But what happens if the text you are processing includes one of those weird sequences of characters, like <|assistant|>? Stuff can definitely break in very unexpected ways.
LLMs generally reserve special token integer identifiers for these, which means that it should be possible to avoid this scenario by encoding the special token as that ID (for example 32001 for <|assistant|> in the Phi-3-mini-4k-instruct vocabulary) while that same sequence of characters in untrusted text is encoded as a longer sequence of smaller tokens.
Many implementations fail to do this! Thanks to Andrej I’ve learned that modern releases of Hugging Face transformers have a split_special_tokens=True parameter (added in 4.32.0 in August 2023) that can handle it. Here’s an example:
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained(“microsoft/Phi-3-mini-4k-instruct")
>>> tokenizer.encode("<|assistant|>")
[32001]
>>> tokenizer.encode("<|assistant|>", split_special_tokens=True)
[529, 29989, 465, 22137, 29989, 29958]
A better option is to use the apply_chat_template() method, which should correctly handle this for you (though I’d like to see confirmation of that).
Tags: andrej-karpathy, prompt-injection, security, generative-ai, transformers, ai, llms

AI Summary and Description: Yes

Summary: The text discusses vulnerabilities in Large Language Models (LLMs) associated with handling special tokens, likening them to SQL injection vulnerabilities. It highlights the risks when parsing unexpected sequences of characters and points out solutions available in libraries like Hugging Face’s transformers.

Detailed Description:

– The text addresses a critical security issue related to LLMs where special tokens (e.g., ,