Simon Willison’s Weblog: Gemini 1.5 Flash-8B is now production ready

Source URL: https://simonwillison.net/2024/Oct/3/gemini-15-flash-8b/#atom-everything
Source: Simon Willison’s Weblog
Title: Gemini 1.5 Flash-8B is now production ready

Feedly Summary: Gemini 1.5 Flash-8B is now production ready
Gemini 1.5 Flash-8B is “a smaller and faster variant of 1.5 Flash" – and is now released to production, at half the price of the 1.5 Flash model.
It’s really, really cheap:

$0.0375 per 1 million tokens on prompts <128K $0.15 per 1 million tokens on prompts >128K
$0.01 per 1 million tokens on cached prompts <128K

I believe images are still charged at a flat rate of 258 tokens, which I think means a single non-cached image with Flash should cost 0.00097 cents – a number so tiny I’m doubting if I got the calculation right.
OpenAI’s cheapest model remains GPT-4o mini, at $0.150/1M input – though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (and by half again if you use batches – Gemini also offer half-off for batched requests).
Anthropic’s cheapest model is still Claude 3 Haiku at $0.25/M, though that drops to $0.03/M for cached tokens (if you configure them correctly).
Via @OfficialLoganK
Tags: vision-llms, gemini, anthropic, openai, ai, llms, google, generative-ai

AI Summary and Description: Yes

Summary: The release of Gemini 1.5 Flash-8B marks a significant development in the generative AI landscape, offering a cost-effective model with lower operational costs compared to existing options like OpenAI’s GPT-4o mini and Anthropic’s Claude 3 Haiku. This model could potentially democratize access to advanced AI capabilities for various users and applications.

Detailed Description: The announcement highlights key aspects of a new AI model that can substantially impact the generative AI space, particularly in pricing and performance efficiency. Here are the main points of relevance:

– **Cost-Effectiveness**: Gemini 1.5 Flash-8B is priced competitively, setting new benchmarks for operational costs. This presents a significant opportunity for developers and businesses looking to integrate AI capabilities without incurring high expenditures.
– Pricing structure:
– $0.0375 per 1 million tokens for prompts under 128K
– $0.15 per 1 million tokens for prompts over 128K
– $0.01 per 1 million tokens for cached prompts under 128K
– **Performance**: Advertised as a “smaller and faster” alternative, the implications for application responsiveness and resource requirements make it an attractive choice for real-time applications, especially in cloud settings.

– **Comparison with Competitors**:
– OpenAI’s model (GPT-4o mini) charges $0.150 per 1 million input tokens, halving for reused prefixes and further reducing costs for batch processing.
– Anthropic’s Claude 3 Haiku offers $0.25/M, dropping to $0.03/M for cached tokens, which still places it higher than Gemini’s offerings.

– **Market Position**: By providing a more affordable option, Gemini 1.5 Flash-8B not only creates new budgetary options for developers but also intensifies competition in the AI sector, potentially driving innovation.

– **Strategic Implications**: The price and performance efficiency may encourage broader adoption of generative AI technologies across various industries, enhancing the landscape for AI in cloud solutions and enterprise applications.

The significance of this announcement cannot be overstated, as it indicates a shift towards more accessible AI solutions that can cater to a broader audience without sacrificing performance.