Google's Ironwood TPU Powers Next-Gen AI Models

From Inference to Action: Ironwood’s Role in Agentic Capabilities

Google’s latest custom AI hardware development includes the seventh-generation Tensor Processing Unit (TPU), dubbed Ironwood, which represents a major breakthrough in powering advanced artificial intelligence technologies. Google’s new chip design targets the intricate computational requirements of its sophisticated Gemini models with a specific focus on simulated reasoning capabilities, which Google describes as “thinking”.

Google’s strategy for AI development centrally involves combining its custom hardware with advanced AI models. Ironwood serves as a vital component that enhances inference performance and broadens context windows in powerful AI models. Google presents Ironwood as their most scalable and powerful TPU ever, which they believe will enable advanced “agentic AI” capabilities that mark the “age of inference” where AI operates actively on behalf of users.

The Architecture and Performance of Ironwood

Ironwood provides significantly higher throughput performance than previous versions. Google plans to establish these chips in extensive liquid-cooled clusters containing up to 9,216 units. Through the newly upgraded Inter-Chip Interconnect (ICI) the chips will establish direct communication which enables rapid and efficient data transfer throughout the system.

The strong infrastructure will support both Google’s internal AI projects and developers who access Google Cloud services. Ironwood will be available in two configurations: The Ironwood system offers both a 256-chip server for controlled environments and a complete 9,216-chip cluster designed to handle the most intensive AI tasks.

A completely operational Ironwood pod possesses massive computational power that reaches an incredible 42.5 Exaflops for inference computing. Google specifications show individual Ironwood chips deliver peak throughput performance of 4,614 TFLOPs which marks substantial improvement from earlier TPU generations. Each chip now holds 192GB of memory which represents a sixfold increase from the memory capacity of the Trillium TPU. The memory bandwidth now delivers 7.2 Tbps which represents a 4.5 times increase.

Understanding the Benchmarks

The variations in benchmarking procedures create difficulties when making direct comparisons between AI chips. The new TPU from Google uses FP8 precision as its benchmarking standard. The company’s claim that Ironwood “pods” operate at 24 times greater speed than equivalent parts of the world’s most powerful supercomputers requires careful interpretation because such supercomputing systems often lack native FP8 hardware support.

The direct performance comparison did not feature Google’s TPU v6 (Trillium). Google indicates that Ironwood provides double the performance per watt in comparison to v6. The company representative indicated that Ironwood replaces the TPU v5p generation while Trillium was developed after the lower-performance TPU v5e. Trillium achieved a peak performance figure of approximately 918 TFLOPS when operating at FP8 precision.

The Implications for the Future of AI

Despite the complexities inherent in benchmarking AI hardware, the underlying message is clear: Ironwood marks an important progression in the development of Google’s AI infrastructure. Ironwood advances speed and efficiency based on the solid groundwork that led to fast development in models such as Gemini 2.5, which runs on older TPU generations.

Google expects that Ironwood’s advanced inference capabilities and its efficiency will lead to substantial artificial intelligence breakthroughs in the upcoming year. The new Ironwood technology will be essential to Google’s “age of inference” by supplying advanced computational power for complex models and enabling true agentic capabilities which will make AI more proactive and integral in our digital experiences.