Simon Willison’s Weblog: Introducing Contextual Retrieval

Source URL: https://simonwillison.net/2024/Sep/20/introducing-contextual-retrieval/#atom-everything
Source: Simon Willison’s Weblog
Title: Introducing Contextual Retrieval

Feedly Summary: Introducing Contextual Retrieval
Here’s an interesting new embedding/RAG technique, described by Anthropic but it should work for any embedding model against any other LLM.
One of the big challenges in implementing semantic search against vector embeddings – often used as part of a RAG system – is creating “chunks" of documents that are most likely to semantically match queries from users.
Anthropic provide this solid example where semantic chunks might let you down:

Imagine you had a collection of financial information (say, U.S. SEC filings) embedded in your knowledge base, and you received the following question: "What was the revenue growth for ACME Corp in Q2 2023?"
A relevant chunk might contain the text: "The company’s revenue grew by 3% over the previous quarter." However, this chunk on its own doesn’t specify which company it’s referring to or the relevant time period, making it difficult to retrieve the right information or use the information effectively.

Their proposed solution is to take each chunk at indexing time and expand it using an LLM – so the above sentence would become this instead:

This chunk is from an SEC filing on ACME corp’s performance in Q2 2023; the previous quarter’s revenue was $314 million. The company’s revenue grew by 3% over the previous quarter."

This chunk was created by Claude 3 Haiku (their least expensive model) using the following prompt template:

</document>
Here is the chunk we want to situate within the whole document
<chunk>

</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

Here’s the really clever bit: running the above prompt for every chunk in a document could get really expensive thanks to the inclusion of the entire document in each prompt. Claude added context caching last month, which allows you to pay around 1/10th of the cost for tokens cached up to your specified beakpoint.
By Anthropic’s calculations:

Assuming 800 token chunks, 8k token documents, 50 token context instructions, and 100 tokens of context per chunk, the one-time cost to generate contextualized chunks is $1.02 per million document tokens.

Anthropic provide a detailed notebook demonstrating an implementation of this pattern. Their eventual solution combines cosine similarity and BM25 indexing, uses embeddings from Voyage AI and adds a reranking step powered by Cohere.
Via Alex Albert
Tags: anthropic, claude, generative-ai, ai, embeddings, llms, search

AI Summary and Description: Yes

Summary: The text introduces an innovative contextual retrieval method developed by Anthropic that enhances semantic search capabilities in embedding models for large language models (LLMs). The approach addresses challenges associated with interpreting and retrieving relevant information from chunks of text, particularly in enterprise settings like financial data retrieval.

Detailed Description:

The text discusses a new embedding and retrieval-augmented generation (RAG) technique that aims to improve semantic search by providing contextually relevant information. Here are the key points about this technique and its implications:

– **Challenge of Semantic Search**: Traditional semantic search utilizing vector embeddings faces difficulties in accurately matching user queries with relevant text chunks. The example presented highlights that critical contextual details can be lost when only partial text is displayed.

– **Example Use Case**: The scenario outlines how a financial query about ACME Corp’s revenue growth might return a vague chunk of information. Without proper context, the retrieved data cannot be effectively utilized.

– **Proposed Solution**:
– **Context Expansion**: Each text chunk can be expanded using an LLM to include vital contextual information. For instance, transforming a simple statement into a fully contextualized response provides clarity about the source and the timeframe of the information.
– **Cost Efficiency through Context Caching**: Implementing context caching significantly reduces the cost associated with running these prompts, making the technique more viable for widespread use in processing large documents.

– **Technical Strategy**:
– **Cost Estimate**: Anthropic outlines a cost of $1.02 per million document tokens for generating these enhanced chunks, which indicates a scalable solution for companies managing extensive datasets.
– **Combining Techniques**: The method integrates cosine similarity, BM25 indexing, and embeddings from Voyage AI, which enhances the search process through a reranking feature powered by Cohere.

– **Practical Implications**: This strategy could revolutionize how organizations, particularly in finance and regulatory environments, manage and retrieve data, leading to improved operational efficiency and accuracy in data utilization.

In summary, Anthropic’s contextual retrieval technique highlights advancements in natural language processing and retrieval methods that cater specifically to the needs of information-heavy industries. It is significant for professionals in AI and cloud security, as improved data retrieval techniques can mitigate risks associated with misinformation and compliance issues.