Tech news in 3 minutes

Andy McLean: Rapidus MoU Will Help British Innovators Access 2-nm Technology

1 d ago

AI chip startup Tensordyne has taped out its data center inference chip, claiming an order-of-magnitude improvement in power efficiency over leading GPUs. The company says its systems achieve 17× the tokens per second per Watt versus an Nvidia GB300-based system, or 13× the tokens per second per rack. Tensordyne co-founder and VP of AI Gilles Backhus said the two biggest challenges in data center AI inference are speed and cost. The chip is built on TSMC 3nm, consumes 300W per package, and offers 2.1 PFLOPS (dense FP8) with 144 GB HBM3e. Its 72-chip Napier server is air-cooled at 30kW, occupies a quarter rack, and includes 10 TB of HBM. Each full rack of four Napier servers (288 chips) delivers 608 PFLOPS dense FP8, 74 GB SRAM, 42 TB HBM, and consumes 120 kW. The performance advantage comes from Tensordyne’s proprietary Pareto number system based on logarithmic number system (LNS), with dedicated hardware acceleration for efficient addition in the log domain. The software stack handles conversion transparently, supporting PyTorch and Triton. The chip has 5× the SRAM of a current-generation GPU (256 MB), reducing HBM accesses. It uses a cell-based network-on-chip (NoC) to reduce tail latency, critical for fast decode with mixture-of-experts models. Single-hop chip-to-chip latency is under 1 microsecond. Tensordyne partnered with HPE Juniper on the system’s scale-up interconnect and chassis. Systems are due to start shipping by Q2 2027, with a development cloud available by end of 2026. Editor’s note: Tensordyne simulations show rack-scale systems achieving 3 million tokens per second per megawatt versus 183,000 for NVL72-GB300 racks, and 363,000 tokens per second per rack versus 27,400, based on InferenceX benchmarks for DeepSeek-R1-670B at FP4.

↗