AMD has officially unveiled ROCm 7, marking a significant upgrade to its open software stack designed to accelerate AI and developer productivity. This new version comes as a follow-up to the widely used ROCm 6, which has seen various updates over the past few years, particularly in AI computing.
ROCm 7 is poised to revolutionize the AI landscape, with a focus on inference and the introduction of groundbreaking features and optimizations.
The new software stack offers a broad range of updates, including enhanced frameworks like vLLM v1, llm-d, and SGLang, aimed at improving inference capabilities. It will also incorporate optimizations for Distributed Inference, Prefill, and Disaggregation, which are expected to significantly enhance performance across AI workloads.
One of the standout features of ROCm 7 is the introduction of new kernels and algorithms such as GEMM Autotuning, MoE, and Attention, along with support for Python-based kernel authoring. The software now also supports advanced datatypes like FP8, FP6, FP4, and Mixed Precision, which will bring a considerable performance boost, particularly for AI models and computations. AMD’s MI350 series GPUs, which will run on ROCm 7, now feature full support for these advanced datatypes, improving both efficiency and performance.
In terms of performance, AMD highlights a massive 3.5x uplift in AI workloads, particularly in inference tasks. For example, ROCm 7 shows a 3.2x increase in Llama 3.1 70B, a 3.4x improvement in Qwen2-72B, and up to a 3.8x increase in DeepSeek R1, compared to ROCm 6. AMD also claims that its MI355X GPU with ROCm 7 outperforms NVIDIA’s Blackwell B200 platform running CUDA by 30% in FP8 throughput performance in DeepSeek R1.
Training performance also sees significant gains, with ROCm 7 delivering up to a 3x increase in workloads like Llama 2 70B, Llama 3.1 8B, and Quen 1.5 7B. These improvements solidify ROCm 7 as a strong contender in the AI space.
As part of its push for enterprise AI, ROCm 7 offers complete end-to-end solutions, secure data integration, and ease of deployment, making it an ideal choice for GenAI workloads. The stack will also be compatible with a range of hardware, including GPUs, CPUs, and DPUs, ensuring broad support for diverse use cases.
Later this year, AMD will expand ROCm support to Ryzen-based laptops and workstations, alongside Linux and Windows support in the second half of 2025.