NVIDIA and Google infrastructure cuts AI inference costs
artificialintelligence-news.com Apr 23, 2026

NVIDIA and Google infrastructure cuts AI inference costs

AI-summarised brief · reviewed before publication

Google and NVIDIA outlined their hardware roadmap to reduce AI inference costs at scale. The new A5X bare-metal instances, running on NVIDIA Vera Rubin NVL72 systems, aim to deliver ten times lower inference cost per token and higher token throughput. The architecture pairs NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology, scaling to 80,000 NVIDIA Rubin GPUs within a single site cluster. This reduces costs and increases efficiency for demanding AI workloads.

💡 Why It Matters

  • · By integrating NVIDIA's platforms with Google Cloud's infrastructure, customers can optimize AI performance, cost, and sustainability.
  • · This partnership enables enterprises to run demanding workloads while addressing data sovereignty and security requirements.