Google takes on Nvidia Blackwell GPUs with the new Trillium TPUs

Google started developing custom AI accelerators in the name of Tensor Processing Units (TPUs) a decade ago. Early this year, Google announced Trillium, its sixth-generation TPU, that delivers better performance with efficiency than its predecessors. Today, Google announced the general availability of Trillium TPUs for Google Cloud customers. Google today also revealed that it used Trillium TPUs to train the new Gemini 2.0.

Nvidia's GPUs are incredibly popular among developers for AI workloads, not just because of their hardware capabilities but also because of their software support. To make Trillium TPUs popular among AI developers, Google has made several improvements to its software layer. It has optimized the XLA compiler and AI frameworks such as JAX, PyTorch, and TensorFlow so that developers can achieve price-performance across AI training, tuning, and serving.

Compared to the previous generation TPU, Trillium offers the following improvements:

Over 4x improvement in training performance

Up to 3x increase in inference throughput

A 67% increase in energy efficiency

An impressive 4.7x increase in peak compute performance per chip

Double the High Bandwidth Memory (HBM) capacity

Double the Interchip Interconnect (ICI) bandwidth

100K Trillium chips in a single Jupiter network fabric

Up to 2.5x improvement in training performance per dollar and up to 1.4x improvement in inference performance per dollar

Google also claimed that Trillium TPUs can achieve 99% scaling efficiency with a deployment of 12 pods consisting of 3072 chips and can achieve 94% scaling efficiency across 24 pods with 6144 chips to pre-train gpt3-175b.

Trillium is now available in North America (US East region), Europe (West region), and Asia (Northeast region). For evaluation, Trillium is available starting at $2.7000 per chip-hour. With 1-year and 3-year commitments, it is available starting at $1.8900 per chip-hour and $1.2200 per chip-hour, respectively.

With its ability to scale to hundreds of thousands of chips and improved software support, Trillium represents a significant leap forward for Google in the cloud AI infrastructure market.