Simon Willison’s Weblog: Qwen2-VL: To See the World More Clearly

Source URL: https://simonwillison.net/2024/Sep/4/qwen2-vl/#atom-everything
Source: Simon Willison’s Weblog
Title: Qwen2-VL: To See the World More Clearly

Feedly Summary: Qwen2-VL: To See the World More Clearly
Qwen is Alibaba Cloud’s organization training LLMs. Their latest model is Qwen2-VL – a vision LLM – and it’s getting some really positive buzz. Here’s a r/LocalLLaMA thread about the model.
The original Qwen models were licensed under their custom Tongyi Qianwen license, but starting with Qwen2 on June 7th 2024 they switched to Apache 2.0, at least for their smaller models:

While Qwen2-72B as well as its instruction-tuned models still uses the original Qianwen License, all other models, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, and Qwen2-57B-A14B, turn to adopt Apache 2.0

Here’s where things get odd: both of the above links are to the Internet Archive, because at some point in the last 24 hours the Qwen GitHub organization, and their GitHub pages hosted blog, both disappeared and are now 404s pages. I asked on Twitter but nobody seems to know what’s happened to them.
The Qwen Hugging Face page is still up – it’s just the GitHub organization that has mysteriously vanished.
Inspired by Dylan Freedman I tried the model using GanymedeNil/Qwen2-VL-7B on Hugging Face Spaces, and found that it was exceptionally good at extracting text from unruly handwriting:

The model apparently runs great on NVIDIA GPUs, and very slowly using the MPS PyTorch backend on Apple Silicon. Qwen previously released MLX builds of their non-vision Qwen2 models, so hopefully there will be an Apple Silicon optimized MLX model for Qwen2-VL soon as well.
Tags: vision-llms, llms, ai, generative-ai

AI Summary and Description: Yes

Summary: The text discusses Qwen2-VL, a vision language model developed by Alibaba Cloud, highlighting its licensing changes, performance, and unexpected disappearance of its GitHub resources. This raises important aspects regarding security and governance in AI model deployment and accessibility.

Detailed Description:
The provided text offers insights into the development and operational aspects of Qwen2-VL, a latest entry in the language model space. Here are the major points of significance:

– **Model Introduction**: Qwen2-VL is a vision language model (LLM) that is marketed by Alibaba Cloud’s organization responsible for training large models. This shows the increasing relevance of visual capabilities in LLMs, which has implications for a variety of applications, including security analytics, surveillance, and data extraction.

– **Licensing Changes**: The transition from a proprietary Tongyi Qianwen license to the more permissive Apache 2.0 license for most models starting June 7th, 2024, may encourage broader usage and faster iterations in AI development. However, the remaining models under the original license could pose compliance issues for organizations looking to integrate this technology. Key considerations include:
– Understanding the implications of licensing on development and deployment.
– Evaluating the compliance requirements due to licensing changes.

– **GitHub Disappearance**: The sudden unavailability of Qwen’s GitHub organization raises concerns about governance and availability of open-source resources. The implications of such disappearances can impact organizations relying on open-source tools for security assessments, model training, or integration. Important points include:
– The need for maintaining robust backup and recovery mechanisms for open-source projects.
– Awareness of potential risks associated with relying on single providers for AI model resources.

– **Model Performance**: The performance evaluation of Qwen2-VL in extracting text from handwriting showcases its practical application in document processing and data extraction tasks, critical in domains like security and compliance. Highlights include:
– Use case potential in automating data retrieval processes in sensitive environments.
– Dependence on hardware optimization for performance efficiency, with implications for infrastructure security.

– **Community Engagement**: The engagement with platforms like Hugging Face points to a collaborative approach to AI development, also reflecting the importance of community support in validating model performance. This emphasizes:
– The significance of community feedback in enhancing the security posture of deployed models.
– The relevance of incorporating community-driven practices in AI governance frameworks.

In summary, the narrative surrounding Qwen2-VL not only illustrates advancements in vision capabilities within LLMs but also raises critical questions about licensing, availability, and performance. These elements are crucial for security and compliance professionals to navigate effectively as they integrate AI solutions into their infrastructure.