Tenstorrent Cuts 20 Cores From Already-Shipping "Blackhole" P150 Cards
Tenstorrent, a startup focused on designing high-performance AI accelerators and led by the renowned computer architect Jim Keller as CEO, has announced significant hardware updates to its existing Blackhole P150 accelerators, which include the P150a and P150b models. In the latest documentation change, the company notes that its Blackhole P150 accelerators will now work with about 14.3% fewer cores than originally advertised. In the official documents, the P150 accelerators are now shipping with 120 working "Tensix" cores instead of the previously advertised 140 cores. The reason for this change is unknown, as the company provided a vague explanation: "To present a unified interface to metal and other system software, firmware v19.5.0 and later will change the core count on all existing cards to 120. Typical workloads show a non-material (~1-2%) performance difference."
The Blackhole P150 accelerators featured 140 "Tensix" cores and 32 GB of GDDR6 memory, operating at up to 300 W in an actively cooled form factor designed for desktop workstations, and the P150a model includes four passive QSFP-DD 800G ports. However, as the number of cores is reduced by approximately 14%, TeraFLOPS take a nosedive as well. In the older documents for the 140-core SKUs, the BLOCKFP8 8-bit floating point performance was listed at 774 TeraFLOPS, while the new 120-core version reduces that number to 664 TeraFLOPS at the same precision level. Why this sudden change is happening is still a mystery. However, the HPC community with a lot of knowledge in the industry suggests a few reasons.
The Blackhole P150 accelerators featured 140 "Tensix" cores and 32 GB of GDDR6 memory, operating at up to 300 W in an actively cooled form factor designed for desktop workstations, and the P150a model includes four passive QSFP-DD 800G ports. However, as the number of cores is reduced by approximately 14%, TeraFLOPS take a nosedive as well. In the older documents for the 140-core SKUs, the BLOCKFP8 8-bit floating point performance was listed at 774 TeraFLOPS, while the new 120-core version reduces that number to 664 TeraFLOPS at the same precision level. Why this sudden change is happening is still a mystery. However, the HPC community with a lot of knowledge in the industry suggests a few reasons.













































