Simon Willison’s Weblog: Llama 3.2

Source URL: https://simonwillison.net/2024/Sep/25/llama-32/#atom-everything
Source: Simon Willison’s Weblog
Title: Llama 3.2

Feedly Summary: Llama 3.2
In further evidence that AI labs are terrible at naming things, Llama 3.2 is a huge upgrade to the Llama 3 series – they’ve released their first multi-modal vision models!

Today, we’re releasing Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices, including pre-trained and instruction-tuned versions.

The 1B and 3B text-only models are exciting too, with a 128,000 token context length and optimized for edge devices (Qualcomm and MediaTek hardware get called out specifically).
Meta partnered directly with Ollama to help with distribution, here’s the Ollama blog post. They only support the two smaller text-only models at the moment – this command will get the 3B model:
ollama run llama3.2

And for the 1B model:
ollama run llama3.2

The two vision models are coming to Ollama “very soon".
Tags: meta, vision-llms, generative-ai, llama, ai, llms

AI Summary and Description: Yes

Summary: The release of Llama 3.2 marks a notable advancement in the realm of AI, specifically within the category of vision language models (LLMs). This update highlights the introduction of multi-modal capabilities, combining both vision and text processing, thereby expanding the potential applications of AI models on edge devices.

Detailed Description:

The text discusses the launch of Llama 3.2 by Meta, emphasizing its technological advancements and potential applications. Here are the key points:

– **Multi-modal Models**: Llama 3.2 introduces new vision language models which integrate both text and vision processing capabilities, marking a significant development in generative AI technology.

– **Model Specifications**:
– **New Model Sizes**:
– **Vision LLMs**: Includes models with 11 billion and 90 billion parameters.
– **Text-only Models**: Features smaller models with 1 billion and 3 billion parameters designed specifically for mobile and edge computing environments.
– **Context Length**: The text-only models offer a robust 128,000 token context length, enhancing their usability for complex tasks.

– **Target Hardware**: The text-only models are optimized for performance on edge devices, specifically highlighting compatibility with Qualcomm and MediaTek hardware.

– **Future Developments**: The vision models are anticipated to be available soon through partnership with Ollama, which indicates a strategic direction towards simplifying access and usage of these advanced models.

– **Distribution Channel**: Meta has partnered with Ollama for the distribution of these models, showcasing a collaborative approach to enhance deployment efficiency in real-world applications.

This release is particularly relevant for professionals in the fields of AI security, as the advancements in model capabilities could lead to new security considerations related to data processing and usage in edge computing environments. The integration of vision and text processing also poses further implications for safety, privacy, and compliance, necessitating an understanding of how such AI models are governed under existing regulations.