The Register: Tenstorrent’s Blackhole chips boast 768 RISC-V cores and almost as many FLOPS

Source URL: https://www.theregister.com/2024/08/27/tenstorrent_ai_blackhole/
Source: The Register
Title: Tenstorrent’s Blackhole chips boast 768 RISC-V cores and almost as many FLOPS

Feedly Summary: Shove 32 of ’em in a box and you’ve got nearly 24 petaFLOPS of FP8 perf
Hot Chips RISC-V champion Tenstorrent offered the closest look yet at its upcoming Blackhole AI accelerators at Hot Chips this week, which they claim can outperform an Nvidia A100 in raw compute and scalability.…

AI Summary and Description: Yes

**Summary:** The text discusses Tenstorrent’s latest advancements in AI accelerator technology, specifically the Blackhole AI accelerators that are poised to outperform Nvidia’s A100 chip. This is significant for professionals in AI and infrastructure security, as the performance improvements and scalability could impact the deployment and security practices around AI workloads.

**Detailed Description:**
The article covers Tenstorrent’s Blackhole AI accelerators, unveiling specifications and comparisons with industry-leading technologies, particularly Nvidia’s popular A100 GPU. Here are the key points:

– **Performance and Architecture:**
– The Blackhole accelerators achieve 745 teraFLOPS of FP8 performance and 372 teraFLOPS at FP16, boasting 32GB of GDDR6 memory.
– Each accelerator is interconnected through an Ethernet-based system providing 1TBps of total bandwidth.
– A Blackhole Galaxy system integrates 32 accelerators, delivering nearly 24 petaFLOPS, with configurations adaptable for compute nodes or memory nodes.

– **Comparative Advantages:**
– Tenstorrent claims nearly 4.8 times the performance per box compared to Nvidia’s A100 systems, showcasing its potential competitive edge against Nvidia’s newer HGX/DGX H100 and H200 systems.
– The design utilizes Ethernet exclusively for interconnects, simplifying networking by avoiding multiple technologies like Nvidia’s NVLink and InfiniBand.

– **Core Design and Capabilities:**
– Each Blackhole chip comprises 140 Tensix cores, accompanied by 16 “Big RISC-V” cores for running Linux and 752 “Baby RISC-V” cores for auxiliary functions such as memory management and off-die communications.
– The architecture focuses on supporting essential AI and high-performance computing (HPC) workloads, including matrix multiplications and convolutions.

– **Software Ecosystem Development:**
– Along with hardware, Tenstorrent introduces its TT-Metalium low-level programming model designed for ease of use in AI applications, drawing parallels to Nvidia’s CUDA while promoting standard C++ APIs.
– The company aims to support major AI frameworks like TensorFlow, PyTorch, and others, enabling developers to harness the advanced capabilities of the Blackhole accelerators effectively.

This information is crucial for professionals in the AI, cloud, and infrastructure sectors as it highlights significant developments in hardware technology that could influence AI processing capabilities, deployment strategies, and the requisite security measures to ensure robust performance amid evolving threats.