BenchmarkGPU is a community-driven PyTorch benchmark for evaluating whether a GPU is actually delivering the level of performance it was marketed to provide.
In practice, real-world throughput can fall short for many reasons:
- Missing or outdated drivers
- Incorrect runtime installation
- Power or thermal limits
- Background system activity
- Misconfigured environment variables
- Silent fallbacks to slower execution paths
This project focuses on repeatable matrix-multiplication benchmarking, stability sampling, and lightweight system-signal checks so you can investigate whether your machine is underperforming.
- Make sure your system has the correct drivers and runtime stack installed before benchmarking.
- For NVIDIA GPUs, install a PyTorch build that matches your CUDA environment.
- For AMD GPUs, install a PyTorch build with ROCm support and the required ROCm drivers.
- For Intel GPUs, install a PyTorch build with
torch.xpusupport and the required Intel GPU drivers/runtime. - For Apple Silicon, make sure you are using a PyTorch build with Apple MPS support on a compatible macOS version.
- This project reports interference indicators only. It is not a full malware scanner.
- This codebase has not been tested extensively across the full hardware matrix.
- Apple MPS and Nvidia CUDA support is well-tested and useful to a certain extent, it is currently more validated than the other accelerator paths.
- The AMD ROCm, and Intel GPU paths have received less development attention.
- Even so, you should still treat all results with engineering caution and verify suspicious behavior on your own hardware.
Run the benchmark through either entrypoint:
python3 main.pyor
python3 -m benchmark_gpuExamples:
python3 main.py --device auto
python3 main.py --device cuda --device-index 0
python3 main.py --device rocm --device-index 0
python3 main.py --device xpu --device-index 0
python3 main.py --device mps
python3 main.py --device cpuThe benchmark collects repeated samples, looks for anomalous measurements, and writes a plain-text report to the results/ directory by default.
You can browse shared benchmark submissions from me and the community in docs/results.md.
If you would like to add your own result, please open a pull request and append a new row to the table after running the benchmark on your hardware.
The codebase is intentionally modular so contributors can work on one subsystem without creating backend-specific spaghetti:
benchmark_gpu/backends/: backend adapters for CUDA, ROCm, Intel XPU, Apple MPS, and CPUbenchmark_gpu/benchmark/: benchmark execution and stability logicbenchmark_gpu/diagnostics/: lightweight interference checksbenchmark_gpu/reports/: plain-text reportingbenchmark_gpu/cli.py: CLI parsing and validationbenchmark_gpu/app.py: application orchestration
If you see any issues, please feel free to:
- Write me an issue, which I will use for testing.
- Or even better, send me a pull request with the fixed code, and I will review it ASAP.
Hardware diversity is the hardest part of a GPU benchmarking project, so real-world bug reports and fixes are extremely valuable.
Thank you for having me in this community-driven project.