Source URL: https://simonwillison.net/2024/Oct/17/gemini-terms-of-service/#atom-everything
Source: Simon Willison’s Weblog
Title: Gemini API Additional Terms of Service
Feedly Summary: Gemini API Additional Terms of Service
I’ve been trying to figure out what Google’s policy is on using data submitted to their Google Gemini LLM for further training. It turns out it’s clearly spelled out in their terms of service, but it differs for the paid v.s. free tiers.
The paid APIs do not train on your inputs:
When you’re using Paid Services, Google doesn’t use your prompts (including associated system instructions, cached content, and files such as images, videos, or documents) or responses to improve our products […] This data may be stored transiently or cached in any country in which Google or its agents maintain facilities.
The Gemini API free tier does:
The terms in this section apply solely to your use of Unpaid Services. […] Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine learning technologies, including Google’s enterprise features, products, and services. To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output.
Confusingly, the following paragraph about data used to fine-tune your own custom models appears in that same “Data Use for Unpaid Services" section:
Google only uses content that you import or upload to our model tuning feature for that express purpose. Tuning content may be retained in connection with your tuned models for purposes of re-tuning when supported models change. When you delete a tuned model, the related tuning content is also deleted.
It turns out their tuning service is "free of charge" on both pay-as-you-go and free plans according to the Gemini pricing page, though you still pay for input/output tokens at inference time (on the paid tier – it looks like the free tier remains free even for those fine-tuned models).
Tags: gemini, llms, google, generative-ai, training-data, ai, fine-tuning
AI Summary and Description: Yes
Summary: The text provides an analysis of the data usage policies stipulated in Google’s Gemini API Terms of Service, particularly focusing on the distinctions between the free and paid tiers concerning data submission for training purposes. This insight has significant implications for professionals managing AI security and compliance aspects, especially regarding data privacy and control.
Detailed Description: The excerpt delves into Google’s Gemini API’s Terms of Service, illuminating the data practices associated with the free versus paid usage tiers. Key aspects of these policies include:
– **Training Data Usage**:
– **Paid Services**: Data submitted through the paid tier is not utilized by Google to train their models. This assurance can be crucial for organizations prioritizing data privacy.
– **Unpaid Services**: Conversely, data from the free tier may be used to enhance Google’s products and machine learning capabilities. This could raise concerns for businesses about data confidentiality and the potential for inadvertent exposure of proprietary information.
– **Human Review and Processing**:
– In the free tier, human reviewers may interact with API inputs and outputs. This introduces a potential risk to data privacy, important for organizations under strict data governance mandates.
– **Custom Model Tuning**:
– The policy around custom models indicates that tuning content is only used for the specified purpose of improving those models and is deletable upon request. However, organizations must ensure they understand the implications of data retention in this context.
– **Pricing and Access**:
– The availability of free tuning services across both tiers, albeit with varying cost structures for input/output tokens, provides flexibility; however, cost-effectiveness must be weighed against potential privacy concerns.
Overall, the distinctions in data handling between the Gemini API’s paid and free tiers are significant for organizations looking to leverage AI tools while maintaining compliance with data privacy regulations. The insights can guide security professionals in making informed decisions about the use of generative AI technologies and adequately managing risks associated with data exposure.
Key Points:
– Emphasis on the divergence of data usage policies between paid and unpaid tiers.
– Attention to risks associated with human involvement in data processing.
– Consideration of compliance implications tied to data retention and usage for model tuning.
– Importance of thorough understanding of terms to safeguard organizational data effectively.