Tag: throughput rates
-
The Register: Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands
Source URL: https://www.theregister.com/2024/08/23/3090_ai_benchmark/ Source: The Register Title: Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands Feedly Summary: For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed If you want to scale a large language model (LLM) to a few…