Simon Willison’s Weblog: mistral.rs

Source URL: https://simonwillison.net/2024/Oct/19/mistralrs/#atom-everything
Source: Simon Willison’s Weblog
Title: mistral.rs

Feedly Summary: mistral.rs
Here’s an LLM inference library written in Rust. It’s not just for that one family of models – like how llama.cpp has grown beyond Llama, mistral.rs has grown beyond Mistral.
This is the first time I’ve been able to run the Llama 3.2 vision model on my own Mac M2 laptop:
git clone https://github.com/EricLBuehler/mistral.rs.git
cd mistral.rs
cargo run –features metal –release — -i –isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama

This downloaded around 20GB of model files to ~/.cache/huggingface/hub/models–lamm-mit–Cephalo-Llama-3.2-11B-Vision-Instruct-128k, loaded the model into memory and started a terminal-based chat session.
I ran this prompt:
\image https://niche-museums.imgix.net/pioneer-history.jpeg?w=1600&h=800&fit=crop&auto=compress <|image|> describe this image including any text
Feeding in this image from my Niche Museums write-up of the Pioneer Memorial Museum:

And got back this response:

The image shows the exterior of the Daughters of Utah Pioneers (DUP) Museum, which is also known as the Pioneer Memorial Museum. The building is located in Salt Lake City, Utah, USA.
Visible in the image are:

The DUP Museum Building: The main structure of the museum is a prominent feature in the image. It is a two-story building with a rectangular shape and a flat roof. The exterior walls are made of brick or stone, giving it a classic and historic appearance.

Signage: Above the entrance, there is a sign that reads “Daughters of Utah Pioneers Museum" in bold letters. Below this sign, another sign indicates that the building is also known as the "Pioneer Memorial Museum."

Entrance: The main entrance to the museum is marked by a doorway with a decorative frame and a pair of large wooden doors.

Landscaping: The surrounding area around the museum appears well-maintained with neatly trimmed grass and some trees nearby.

Statue or Monument: In front of the museum, there is a small statue or monument that appears to be dedicated to pioneers or early settlers.

The Daughters of Utah Pioneers (DUP) Museum showcases historical artifacts and exhibits related to Utah’s pioneer history, focusing on the experiences and contributions of early settlers in the region.

Activity Monitor confirmed that this Metal build was using my GPU, and about 10GB of memory, while the prompt was running.
There’s also a Python binding, which can be installed from PyPI but requires a Rust toolchain (I filed an issue requesting PyPI wheels):
pip install mistralrs-metal

I got an error running the Python Llama vision demo but the Phi 3.5 Vision example worked as advertised – I modified that to enable streaming output and ran it against this image from Mendenhall’s Museum of Gasoline Pumps & Petroliana:

The image captures a nostalgic scene of a vintage gas station, reminiscent of a bygone era. Dominating the left side of the frame is a red and white gas pump, standing tall and proud, its metallic surface gleaming under the soft light filtering through the wooden ceiling. Adjacent to it, a white and black gas canister stands ready for service, its presence adding to the authenticity of the setting.
On the right, a red and white gasoline sign hangs from the ceiling, its bold colors contrasting with the muted tones of the surroundings. Above it, a yellow and white gasoline sign is suspended, its vibrant hues drawing the eye upwards.
The floor beneath these relics of the past is a checkerboard pattern, a common design choice for gas stations of yesteryears. It provides a stark contrast to the smooth, unblemished surfaces of the gas pumps and canisters.
In the background, a variety of other signs and advertisements add to the eclectic mix of objects. They are a testament to the diverse range of products and services that were once available at this location.
Despite the passage of time, the gas station retains a certain charm, its vintage gas pumps and signs serving as a tangible link to a different era. The image is a snapshot of history, frozen in time, waiting to be discovered and appreciated by those who take the time to look closer.

If you’re running Python 3.10 on Apple Silicon you may be able to skip the Rust compiler by installing the wheel I built here:
pip install https://static.simonwillison.net/static/2024/mistralrs_metal-0.3.1-cp310-cp310-macosx_11_0_arm64.whl

Tags: llms, vision-llms, rust, python, generative-ai, llama, phi, ai

AI Summary and Description: Yes

Summary: The text discusses the mistral.rs library, which enables local inference of various models, including generative AI and vision models, on personal hardware such as Apple Silicon. This is particularly relevant for AI and LLM security professionals who deal with the implications of running advanced models locally, encompassing performance and resource usage considerations.

Detailed Description:

– **Library Overview**: Mistral.rs is a library for executing LLM inference written in Rust, expanding its utility not just to Mistral models, but also to a broader range, similar to llama.cpp.
– **Installation and Usage**:
– Users can clone the library from GitHub and run specific commands to execute models on their devices, such as a Mac M2.
– It allows downloading and managing large model files (around 20GB), emphasizing the resource requirements of LLMs.
– **Performance Observations**:
– The library leverages the GPU through Apple’s Metal API, indicating optimization for hardware acceleration.
– It demonstrates significant memory usage (approx. 10GB) during processing, which has security implications regarding resource management and potential performance bottlenecks in production settings.
– **Model Functionality**:
– Examples of using the model for image analysis are provided, showcasing its abilities in interpreting visual data and generating textual descriptions.
– **Compatibility Issues**:
– The text notes challenges running certain demos and highlights the necessity for a Rust toolchain for Python bindings, which may complicate deployment for some users.
– An alternative pre-built wheel for Python users is provided, easing integration for those on Apple Silicon—this indicates community effort towards accessibility and user-friendliness.

**Significance for Professionals**:
– **AI Security**: Understanding local model execution helps professionals manage risks associated with data processing directly on devices versus in the cloud.
– **Cloud Computing**: This development signifies a shift where powerful models can be run locally, reducing dependency on cloud resources and enhancing data privacy.
– **Operational Considerations**: Users must evaluate the trade-offs related to performance, memory, and the computing environment when deploying such models, as these aspects directly impact the security posture and operational capabilities of AI infrastructures.

In summary, the text provides valuable insights into the capabilities of mistral.rs as a tool for executing complex AI models, critical for professionals focusing on security, compliance, and effective infrastructure management in the realm of AI.