Google’s DiffusionGemma Generates Text 4x Faster: Diffusion Replaces Token-by-Token Output
AI-summarised brief · reviewed before publication
Google DeepMind released DiffusionGemma, an experimental open-weights model that generates text using discrete diffusion, replacing the token-by-token method. It produces over 1,000 tokens per second on a single Nvidia H100, up to four times faster than comparable autoregressive models. DiffusionGemma trades some quality for speed and is available for free under an Apache 2.0 license. The model uses a 26-billion-parameter mixture-of-experts architecture, refining blocks of 256 tokens in parallel. This approach enables faster text generation, making it suitable for speed-critical workflows, but with lower output quality than standard models.
💡 Why It Matters
- · Faster text generation can transform latency-dominated tasks like drafting and real-time interfaces.
- · Lower accuracy may be acceptable in applications where speed is paramount, such as on-device assistants and autocomplete features.