The Register: Hugging Face puts the squeeze on Nvidia’s software ambitions

Source URL: https://www.theregister.com/2024/10/24/huggingface_hugs_nvidia/
Source: The Register
Title: Hugging Face puts the squeeze on Nvidia’s software ambitions

Feedly Summary: AI model repo promises lower costs, broader compatibility for NIMs competitor
Hugging Face this week announced HUGS, its answer to Nvidia’s Inference Microservices (NIMs), which the AI repo claims will let customers deploy and run LLMs and models on a much wider variety of hardware.…

AI Summary and Description: Yes

Summary:
Hugging Face has introduced HUGS as a competitive service to Nvidia’s Inference Microservices, allowing users to deploy large language models (LLMs) on various hardware platforms. Unlike Nvidia’s offerings, HUGS utilizes open-source technologies and provides cost-effective deployment options on cloud platforms while emphasizing the practicality of containerized solutions.

Detailed Description:
Hugging Face’s announcement of HUGS marks a significant move in the landscape of AI model deployment, addressing key challenges in running LLMs at scale. Here are the main points from the text:

– **Service Overview**: HUGS, akin to Nvidia’s Inference Microservices (NIMs), provides containerized model images that simplify the deployment of LLMs across different hardware setups.

– **Deployment Convenience**: Users can rapidly deploy LLMs without the complexity associated with optimizing various libraries (like vLLM or TensorRT) by utilizing preconfigured container images in Docker or Kubernetes.

– **Wide Hardware Compatibility**:
– HUGS supports multiple hardware platforms, including both Nvidia and AMD GPUs.
– Future support is expected for specialized AI accelerators (e.g., Amazon’s Inferentia, Google’s TPUs), making it versatile in terms of hardware deployment.

– **Cost Efficiency**:
– HUGS will cost about $1 an hour per container when deployed on platforms like AWS or Google Cloud.
– Compared to Nvidia, which charges $1 per hour per GPU for NIMs, Hugging Face offers significant savings, especially for larger models that span multiple GPUs.

– **Smaller Scale Deployments**: For those wanting to deploy HUGS on a smaller scale, DigitalOcean will host the images at no extra charge, although compute costs will still apply.

– **Flexible Infrastructure Options**: Enterprise Hub subscribers can deploy HUGS on their infrastructure, adding flexibility for enterprise users.

– **Model Focus**: Hugging Face is initially offering support for popular open-source models, including various versions of Meta’s Llama, Mistral AI’s models, Google’s Gemma, and Alibaba’s Qwen.

– **DIY Option**: Users who prefer not to pay for HUGS or NIMs can still opt to build their containerized models using open-source solutions, emphasizing community engagement and resourcefulness.

– **Value Proposition**: The pricing reflects the time investment in optimizing the containers for performance rather than the mere acquisition of software and model files.

Overall, Hugging Face’s HUGS presents an important development in the AI space, providing both cost-effective and technically flexible solutions for LLM deployment. This service will be particularly relevant for AI, cloud computing, and infrastructure security professionals looking for efficient deployment models that can adapt to varying hardware and budget considerations.