Tag: training systems
-
The Register: xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster
Source URL: https://www.theregister.com/2024/10/29/xai_colossus_networking/ Source: The Register Title: xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster Feedly Summary: Work already underway to expand system to 200,000 Nvidia Hopper chips Unlike most AI training clusters, xAI’s Colossus with its 100,000 Nvidia Hopper GPUs doesn’t use InfiniBand. Instead, the massive system, which Nvidia bills as…