Simon Willison’s Weblog: Claude API: PDF support (beta)

Source URL: https://simonwillison.net/2024/Nov/1/claude-api-pdf-support-beta/#atom-everything
Source: Simon Willison’s Weblog
Title: Claude API: PDF support (beta)

Feedly Summary: Claude API: PDF support (beta)
Claude 3.5 Sonnet now accepts PDFs as attachments:

The new Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) model now supports PDF input and understands both text and visual content within documents.

I just released llm-claude-3 0.7 with support for the new attachment type, so now you can do this:
llm install llm-claude-3 –upgrade
llm -m claude-3.5-sonnet ‘extract text’ -a mydoc.pdf

Also new today: Claude now offers a free (albeit rate-limited) token counting API. This addresses a complaint I’ve had for a while: previously it wasn’t possible to accurately estimate the cost of a prompt before sending it to be executed.
Via @alexalbert__
Tags: vision-llms, claude-3-5-sonnet, llm, anthropic, claude, ai, llms, pdf, generative-ai, projects

AI Summary and Description: Yes

Summary: The introduction of PDF support in Claude 3.5 Sonnet enhances the model’s functionality, enabling it to process both text and visual content from documents. This update includes a new token counting API, which provides users with the ability to estimate costs before executing commands, thus improving cost management in AI operations.

Detailed Description: The recent update of the Claude 3.5 Sonnet by Anthropic represents a significant advancement in the capabilities of AI language models. Here are the major points of this update:

– **PDF Support**: The Claude 3.5 Sonnet model now accepts PDFs as attachments, which allows users to process complex documents that contain not only text but also visual elements.

– **Enhanced Functionality**: By understanding both text and visual content, the model can assist users in a more integrated manner, opening up additional use cases such as document analysis, data extraction, and visual interpretation.

– **Token Counting API**: The model now includes a rate-limited token counting API, addressing previous limitations regarding cost estimation. This new feature allows users to calculate the expected cost associated with executing a prompt, thus aiding in better budget management and operational efficiency.

– **Practical Implications**:
– Professionals working in AI, particularly those focusing on document processing or leveraging language models for business applications, will find this enhancement valuable.
– Increased functionality may lead to broader adoption in sectors that require document analysis, such as legal, healthcare, and education.
– Users can leverage these new features to improve cost efficiency and predictability in their AI model usage.

Overall, these updates contribute to the evolution of AI language models, indicating a trend towards more versatile and economically manageable AI solutions.