Simon Willison’s Weblog: SmolLM2

Source URL: https://simonwillison.net/2024/Nov/2/smollm2/#atom-everything
Source: Simon Willison’s Weblog
Title: SmolLM2

Feedly Summary: SmolLM2
New from Loubna Ben Allal and her research team at Hugging Face:

SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. […]
It was trained on 11 trillion tokens using a diverse dataset combination: FineWeb-Edu, DCLM, The Stack, along with new mathematics and coding datasets that we curated and will release soon.

The model weights are released under an Apache 2 license. I’ve been trying these out using my llm-gguf plugin for LLM and my first impressions are really good.
Here’s a recipe to run a 1.7GB Q8 quantized model from lmstudio-community:
llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-1.7B-Instruct-GGUF/resolve/main/SmolLM2-1.7B-Instruct-Q8_0.gguf -a smol17
llm chat -m smol17

Or at the other end of the scale, here’s how to run the 138MB Q8 quantized 135M model:
llm gguf download-model https://huggingface.co/lmstudio-community/SmolLM2-135M-Instruct-GGUF/resolve/main/SmolLM2-135M-Instruct-Q8_0.gguf’ -a smol135m
llm chat -m smol135m

The blog entry to accompany SmolLM2 should be coming soon, but in the meantime here’s the entry from July introducing the first version: SmolLM – blazingly fast and remarkably powerful .
Via @LoubnaBenAllal1
Tags: llm, hugging-face, generative-ai, ai, llms, open-source

AI Summary and Description: Yes

Summary: SmolLM2 represents a significant advancement in compact language models, offering lightweight options that can operate on-device while maintaining versatility across various tasks. The model’s training on a massive dataset enhances its capabilities, making it relevant for AI applications and generative AI security.

Detailed Description:
SmolLM2 is a newly introduced family of language models developed by Loubna Ben Allal and her team at Hugging Face. It features three distinct sizes—135 million, 360 million, and 1.7 billion parameters—each designed to perform a range of tasks efficiently while being lightweight enough for on-device execution.

Key aspects include:
– **Extensive Training Data**: SmolLM2 was trained on an impressive 11 trillion tokens, utilizing a diverse pool of datasets, including FineWeb-Edu, DCLM, and The Stack. This training approach enhances the model’s performance and versatility.
– **Quantization Options**: The models also offer Q8 quantization, which reduces their size to make them more manageable. For example, the 1.7B model is about 1.7GB, while the smaller 135M model is only 138MB.
– **Licensing**: The model weights are released under the Apache 2 license, promoting accessibility and collaboration within the AI community.
– **Practical Implementation**: The content provides examples on how to run the models using the llm-gguf plugin for LLM, showcasing practical applications that professionals can adopt in their environments.

Overall, SmolLM2’s lightweight design combined with powerful capabilities makes it particularly noteworthy for professionals focusing on AI, cloud applications, and generative AI solutions. As the tech industry increasingly emphasizes the efficiency and performance of AI models, SmolLM2 stands out as a promising development that aligns well with modern computational needs.