Simon Willison’s Weblog: Updated production-ready Gemini models

Source URL: https://simonwillison.net/2024/Sep/24/gemini-models/#atom-everything
Source: Simon Willison’s Weblog
Title: Updated production-ready Gemini models

Feedly Summary: Updated production-ready Gemini models
Two new models from Google Gemini today: gemini-1.5-pro-002 and gemini-1.5-flash-002. Their -latest aliases will update to these new models in “the next few days", and new -001 suffixes can be used to stick with the older models. The new models benchmark slightly better in various ways and should respond faster.
Flash continues to have a 1,048,576 input token and 8,192 output token limit. Pro is 2,097,152 input tokens.
Google also announced a significant price reduction for Pro, effective on the 1st of October. Inputs less than 128,000 tokens drop from $3.50/million to $1.25/million (above 128,000 tokens it’s dropping from $7 to $5) and output costs drop from $10.50/million to $2.50/million ($21 down to $10 for the >128,000 case).
Gemin has always offered finely grained safety filters – it sounds like those are now turned down to minimum by default, which is a welcome change:

For the models released today, the filters will not be applied by default so that developers can determine the configuration best suited for their use case.

Also interesting: they’ve tweaked the expected length of default responses:

For use cases like summarization, question answering, and extraction, the default output length of the updated models is ~5-20% shorter than previous models.

Tags: gemini, google, generative-ai, ai, llms

AI Summary and Description: Yes

Summary: Google has updated its Gemini models, introducing new efficiency features along with significant price reductions. The latest versions enhance performance metrics while allowing developers more control over safety filters.

Detailed Description: The introduction of the Gemini-1.5 series models represents an important advancement in the generative AI landscape from Google. Here are the key points about the latest updates:

– **New Models**: Google launched two new models, gemini-1.5-pro-002 and gemini-1.5-flash-002, which are expected to outperform their predecessors slightly and respond faster to queries.
– **Token Limits**: The Flash model retains a token input limit of 1,048,576 and output limit of 8,192 tokens. Meanwhile, the Pro model shows a substantial increase to a 2,097,152 input token limit, enhancing its capability for complex requests or longer interactions.
– **Pricing Changes**: Google announced a reduction in pricing for the Pro model, effective October 1st. Prices for input tokens below 128,000 have decreased from $3.50/million to $1.25/million, while prices for higher inputs and outputs have also been slashed significantly.
– **Safety Filters**: A notable aspect of these new models is the modification of the safety filter settings. By default, these filters will not be applied, allowing developers the flexibility to customize configurations to better suit application needs.
– **Response Length Adjustments**: In addition, the expected response length has been reduced by approximately 5-20% for tasks such as summarization and question answering, indicating an optimization for more concise outputs.

This update is particularly noteworthy for professionals in AI, generative AI security, and those involved in managing LLMs, as it reflects a commitment from Google to enhance performance while providing more control over model configurations. The ability to adjust safety filters can have implications for compliance and privacy considerations, depending on how developers choose to implement these features.

– The price reductions could also drive wider adoption of these models among developers, potentially leading to increased competition in the AI space.
– These enhancements position Google as a strong player in the increasingly crowded generative AI market, catering to both performance needs and affordability.