Simon Willison’s Weblog: OpenAI: Improve file search result relevance with chunk ranking

Source URL: https://simonwillison.net/2024/Aug/30/openai-file-search/#atom-everything
Source: Simon Willison’s Weblog
Title: OpenAI: Improve file search result relevance with chunk ranking

Feedly Summary: OpenAI: Improve file search result relevance with chunk ranking
I’ve mostly been ignoring OpenAI’s Assistants API. It provides an alternative to their standard messages API where you construct “assistants", chatbots with optional access to additional tools and that store full conversation threads on the server so you don’t need to pass the previous conversation with every call to their API.
I’m pretty comfortable with their existing API and I found the assistants API to be quite a bit more complicated. So far the only thing I’ve used it for is a script to scrape OpenAI Code Interpreter to keep track of updates to their enviroment’s Python packages.
Code Interpreter aside, the other interesting assistants feature is File Search. You can upload files in a wide variety of formats and OpenAI will chunk them, store the chunks in a vector store and make them available to help answer questions posed to your assistant – it’s their version of hosted RAG.
Prior to today OpenAI had kept the details of how this worked undocumented. I found this infuriating, because when I’m building a RAG system the details of how files are chunked and scored for relevance is the whole game – without understanding that I can’t make effective decisions about what kind of documents to use and how to build on top of the tool.
This has finally changed! You can now run a "step" (a round of conversation in the chat) and then retrieve details of exactly which chunks of the file were used in the response and how they were scored using the following incantation:
run_step = client.beta.threads.runs.steps.retrieve(
thread_id="thread_abc123",
run_id="run_abc123",
step_id="step_abc123",
include=[
"step_details.tool_calls[*].file_search.results[*].content"
]
)
(See what I mean about the API being a little obtuse?)
I tried this out today and the results were very promising. Here’s a chat transcript with an assistant I created against an old PDF copy of the Datasette documentation – I used the above new API to dump out the full list of snippets used to answer the question "tell me about ways to use spatialite".
It pulled in a lot of content! 57,017 characters by my count, spread across 20 search results, for a total of 15,021 tokens as measured by ttok.
OpenAI provide up to 1GB of vector storage for free, then charge $0.10/GB/day for vector storage beyond that. My 173 page PDF seems to have taken up 728KB after being chunked and stored, so that GB should stretch a pretty long way.
Via @OpenAIDevs
Tags: embeddings, vector-search, generative-ai, openai, ai, rag, llms

AI Summary and Description: Yes

Summary: The text discusses OpenAI’s Assistants API, highlighting advancements in file search relevance through chunk ranking. This development is particularly significant for professionals in AI and cloud computing, as the ability to better manage and retrieve information from various document formats using chunked data can enhance the efficiency of AI applications.

Detailed Description: The provided text focuses on the recent improvements to OpenAI’s Assistants API, particularly regarding the handling of files and search relevance through chunk ranking. Here are the major points covered:

– **Assistants API Overview**:
– The Assistants API allows the creation of chatbots that can maintain conversation threads without needing to pass the entire conversation history each time.
– This API has been perceived as more complex compared to the standard messages API.

– **File Search Feature**:
– One of the notable advancements is the File Search feature, which enables users to upload and chunk various file formats. These chunks are then stored in a vector store, which aids in answering queries directed at the assistant.
– This feature represents OpenAI’s version of a hosted Retrieval-Augmented Generation (RAG) system, enhancing how users can interact with large datasets.

– **Documentation Improvements**:
– Prior to this update, details regarding how files were chunked and scored for relevance were undocumented, which posed challenges for users wishing to construct effective RAG systems. The recent documentation change provides clarity and allows users to understand better the tools at their disposal.

– **API Functionality**:
– The ability to run a “step” in a chat session and retrieve specific chunk details used in responses adds a level of transparency and control for developers. This can significantly impact how users build their applications and systems, allowing for more tailored content retrieval.

– **Performance Metrics**:
– The author shares personal experience with the new capabilities, demonstrating a chat transcript where the assistant answered a query based on a PDF. The results included a substantial amount of relevant content extracted from the document, showcasing the API’s capability to handle and retrieve significant amounts of information effectively.

– **Cost and Storage**:
– OpenAI provides up to 1GB of vector storage for free, with additional storage available at a cost. The author notes how their PDF’s storage requirement is minimal relative to available allocations, which hints at the scalability of the service.

This advancement reflects a trend in improving AI tools to enhance usability, particularly in file handling and information retrieval. Professionals in AI, cloud computing security, and infrastructure should closely monitor these developments as they may influence how information systems are designed and interact with AI technologies.