Tag: training clusters

  • The Register: xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster

    Source URL: https://www.theregister.com/2024/10/29/xai_colossus_networking/ Source: The Register Title: xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster Feedly Summary: Work already underway to expand system to 200,000 Nvidia Hopper chips Unlike most AI training clusters, xAI’s Colossus with its 100,000 Nvidia Hopper GPUs doesn’t use InfiniBand. Instead, the massive system, which Nvidia bills as…

  • Hacker News: The open future of networking hardware for AI

    Source URL: https://engineering.fb.com/2024/10/15/data-infrastructure/open-future-networking-hardware-ai-ocp-2024-meta/ Source: Hacker News Title: The open future of networking hardware for AI Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses Meta’s advancements in networking technologies for AI clusters, focusing on their next-generation network fabric announced at the Open Compute Project Summit 2024. This innovation is significant for professionals…