How to Run LLMs on AMD GPUs: ROCm, Vulkan, and Everything You Need to Know
NVIDIA isn't the only option. Learn how to run open-source LLMs on AMD Radeon GPUs using ROCm, llama.cpp with Vulkan, and other backends — with step-by-step setup instructions.
AMD GPUs offer excellent price-to-VRAM ratios, making them compelling for local LLM users. A used RX 6800 with 16 GB VRAM costs less than an RTX 4060 with 8 GB. But the software story has historically been more complex. Here is everything you need to know in 2026.
The State of AMD LLM Support
AMD's ROCm platform has matured significantly. As of ROCm 7.0, most RDNA 3 and RDNA 4 GPUs are officially supported on both Linux and Windows. llama.cpp supports Vulkan as an alternative backend, and MLC-LLM has first-class ROCm support.
Supported GPUs
| GPU | VRAM | ROCm (Linux) | ROCm (Windows) | Vulkan | |-----|------|-------------|----------------|--------| | RX 7900 XTX | 24 GB | ✅ Official | ✅ Official | ✅ | | RX 7900 XT | 20 GB | ✅ Official | ✅ Official | ✅ | | RX 7800 XT | 16 GB | ✅ Official | ✅ Official | ✅ | | RX 7600 | 8 GB | ✅ Official | ✅ Official | ✅ | | RX 6800 XT | 16 GB | ✅ Community | ❌ | ✅ |
Method 1: llama.cpp with ROCm (Recommended)
This is the most reliable path for most users.
Linux Setup
# Install ROCm (Ubuntu 24.04)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install.deb
sudo apt install ./amdgpu-install.deb
sudo amdgpu-install --usecase=rocm
# Build llama.cpp with ROCm
sudo apt install rocm-dev hip-dev
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make GGML_HIPBLAS=1 -j$(nproc)
# Run a model
./llama-cli -m llama-4-8b-q4_k_m.gguf -ngl 99 -p "Hello world"
Windows Setup
# Install HIP SDK from AMD website
# Then build llama.cpp with:
cmake -B build -DGGML_HIPBLAS=ON
cmake --build build --config Release -j
Method 2: llama.cpp with Vulkan
Vulkan works on any AMD GPU (including older GCN cards) without ROCm:
# Build
make GGML_VULKAN=1 -j$(nproc)
# Run
./llama-cli -m model.gguf -ngl 99 -p "Hello"
Vulkan performance is typically 80–90% of ROCm — good enough for most users and much easier to set up.
Method 3: MLC-LLM
MLC-LLM has first-class AMD support via its unified runtime:
pip install mlc-llm
mlc_llm chat HF://mlc-ai/Llama-4-8B-q4f16_1-MLC
Performance Comparison
RTX 4080 (16 GB) vs RX 7900 XT (20 GB), both running Llama 4 8B Q4_K_M:
| Metric | RTX 4080 | RX 7900 XT | |--------|---------|------------| | Tokens/s (prompt) | 1,200 | 980 | | Tokens/s (generation) | 85 | 72 | | VRAM used | 5.2 GB | 5.4 GB | | Power draw | 220W | 260W |
NVIDIA still leads in raw speed and power efficiency, but AMD's extra VRAM at the same price point often makes it the better choice for running larger models.
Recommendations
- Budget option: Used RX 6800 (16 GB) — unbeatable VRAM-per-dollar
- Mid-range: RX 7900 XT (20 GB) — runs 13B Q8 models comfortably
- High-end: RX 7900 XTX (24 GB) — competes with RTX 4090 for LLM workloads
AMD is now a serious option for local LLM users. The software gap has narrowed dramatically, and the VRAM advantage makes AMD GPUs worth considering for anyone building a dedicated LLM machine.
Browse our GPU Database to see which AMD GPUs support your target models.