NVIDIA and Google infrastructure cuts AI inference costs
AI-summarised brief · reviewed before publication
Google and NVIDIA outlined their hardware roadmap to reduce AI inference costs at scale. The new A5X bare-metal instances, running on NVIDIA Vera Rubin NVL72 systems, aim to deliver ten times lower inference cost per token and higher token throughput. The architecture pairs NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology, scaling to 80,000 NVIDIA Rubin GPUs within a single site cluster. This reduces costs and increases efficiency for demanding AI workloads.
💡 Why It Matters
- · By integrating NVIDIA's platforms with Google Cloud's infrastructure, customers can optimize AI performance, cost, and sustainability.
- · This partnership enables enterprises to run demanding workloads while addressing data sovereignty and security requirements.