Feedback Loop:
AI Software Needs → GPU Architecture Changes
↑ ↓
Performance Bottlenecks ← New Hardware Features
Real Examples:
- Mixed-precision: FP16, bfloat16, FP8 support
- Communication: NCCL, NVLink for multi-GPU
- Memory: SRAM improvements for transformer models
- Specialized units: Transformer Engines in Hopper