fix: Resolved #993 - Support >32 Qubit simulations on AMD GPUs via 3D grid folding by sun-9545sunoj · Pull Request #1016 · quantumlib/qsim

sun-9545sunoj · 2026-02-06T07:07:02Z

This PR provides a comprehensive solution to Issue #993, addressing the hipErrorInvalidConfiguration encountered when dispatching circuits beyond 31 qubits on AMD hardware.
By refactoring the dispatch logic and indexing, qsim can now successfully run simulations of 32+ qubits on high-memory devices like the AMD MI300X.
The Solution
Resolved Dispatch Limits:
Implemented 3D grid folding in CreateGrid to bypass the hardware-specific 1D $x$-dimension limit (65,535 blocks). Large workloads are now distributed across $(x, y, z)$ dimensions, supporting the massive thread counts required for high-qubit states.
64-bit Indexing:
Replaced 32-bit signed integers with uint64_t for state-vector addressing. This prevents index overflow when the state space exceeds $2^{31}$ amplitudes, which occurs at the 32-qubit boundary.
Unlocked >32 Qubit Support:
Full State-Vector: Successfully verified 34 qubits (~128GB VRAM) on a single MI300X.

Hybrid Simulation: Introduced a GPU-accelerated hybrid simulator (qsimh_base_cuda.cu) to enable 32+ runs by partitioning the state space into manageable segments.
Verification & Benchmarks
Environment: AMD MI300X (192GB), ROCm 7.1.0.
Regression: Small-scale circuits (< 30 qubits) run with 100% accuracy.
Stress Test: Verified a 50-qubit hybrid simulation with 100% GPU utilization and sustained 750W power draw.
Correctness: Confirmed that 64-bit block IDs are correctly calculated across multi-dimensional grids.

Modified Files
apps/qsimh_base_cuda.cu (New Hybrid Simulator)

lib/cuda2hip.h (ROCm compatibility)

lib/simulator_cuda.h (3D Dispatch)

lib/simulator_cuda_kernels.h (64-bit Kernels)

lib/statespace_cuda.h (Grid Folding)

lib/statespace_cuda_kernels.h (Block ID Helpers)

lib/vectorspace_cuda.h (Namespace Isolation)

gemini-code-assist

Code Review

This pull request introduces significant changes to support simulations with more than 32 qubits on AMD GPUs by implementing 3D grid folding and 64-bit indexing. The changes are comprehensive and address the core issue. My review focuses on ensuring the correctness and maintainability of these new features. I've found several critical issues in the CUDA kernels where the 3D block ID is not calculated correctly, which could lead to incorrect simulation results. Additionally, there are some correctness issues with command-line argument parsing and opportunities to improve maintainability by reducing code duplication.

lib/simulator_cuda_kernels.h

apps/qsimh_base_cuda.cu

lib/simulator_cuda.h

lib/statespace_cuda.h

lib/statespace_cuda_kernels.h

josemonsalve2

I suggest you remove all formatting changes for a different PR. My guess is that you have autoformatting enabled in VSCode (or your IDE), which is causing the file to be formatted completely. There is a configuration option to format only the changes (git diff). This is an option in the configuration menu.

lib/cuda2hip.h

lib/simulator_cuda.h

…X scaling logic

josemonsalve2

Thanks for taking care of this @sun-9545sunoj.
I added a few comments here and there. I am new to QSim, so take that into consideration as well.

I like the overall strategy. Let's wait for @mhucka to see what he thinks.

apps/qsimh_base_cuda.cu

lib/statespace_cuda.h

lib/statespace_cuda_kernels.h

lib/vectorspace_cuda.h

josemonsalve2 · 2026-02-13T16:23:29Z

I also wonder if the Hybrid simulator is essential to supporting 34 qubits on AMD. Should this go into a different PR?

requested chages are done!

requested changes are done!

requested chages are done! removed the uneccesaary formate chages

sun-9545sunoj · 2026-02-13T20:00:55Z

That’s a fair point, Jose. The 34-qubit scaling I achieved on the MI300X primarily relied on the changes in the full Schrödinger simulator ( simulator_cuda.h and vectorspace_cuda.h). While the Hybrid simulator would also benefit from these 64-bit updates, it isn't strictly essential for the core GPU scaling fix I'm proposing here. I’m happy to revert the qsimh changes and move them to a separate PR to keep this one focused on the primary CUDA/HIP scaling logic. Would you prefer I strip those out now?

…

On Fri, Feb 13, 2026 at 9:53 PM Jose Manuel Monsalve Diaz < ***@***.***> wrote: *josemonsalve2* left a comment (quantumlib/qsim#1016) <#1016 (comment)> I also wonder if the Hybrid simulator is essential to supporting 34 qubits on AMD. Should this go into a different PR? — Reply to this email directly, view it on GitHub <#1016 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BFIKNS7X5GO45ZIJDLMTVZ34LX3BRAVCNFSM6AAAAACUGEAEEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQOJYGA4DEMBSHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

josemonsalve2 · 2026-02-14T02:26:48Z

I’m going to let @mhucka decide on that. Other projects I’ve worked on would prefer that. But I’m not part of this community so I’d defer to them for that.

github-actions bot added the size: L 250< lines changed <1000 label Feb 6, 2026

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

github-actions bot added size: XL lines changed >1000 and removed size: L 250< lines changed <1000 labels Feb 6, 2026

sun-9545sunoj force-pushed the gpu-scaling-fix branch from efa9d2f to a53e0ff Compare February 6, 2026 13:39

sun-9545sunoj mentioned this pull request Feb 12, 2026

[AMD GPU] Issue with dispatch dimension beyon 32 Qubits #993

Open

josemonsalve2 suggested changes Feb 12, 2026

View reviewed changes

lib/cuda2hip.h Show resolved Hide resolved

lib/simulator_cuda.h Show resolved Hide resolved

fix: resolve quantumlib#993 -remove formating noise and isolate MI300…

4c272a1

…X scaling logic

sun-9545sunoj force-pushed the gpu-scaling-fix branch from 491c0db to 4c272a1 Compare February 13, 2026 09:32

github-actions bot added size: L 250< lines changed <1000 and removed size: XL lines changed >1000 labels Feb 13, 2026

josemonsalve2 reviewed Feb 13, 2026

View reviewed changes

sun-9545sunoj added 5 commits February 14, 2026 00:30

Update copyright year and usage message for CUDA

8edca1d

Refactor memory management namespace and functions

de169cc

requested chages are done!

Refactor memory management namespace and functions

6632260

requested changes are done!

Refactor memory management namespace and functions

626c1e2

requested chages are done! removed the uneccesaary formate chages

Fix syntax error in SetStateUniformKernel

064145d

sun-9545sunoj requested a review from josemonsalve2 February 13, 2026 19:49

Merge branch 'main' into gpu-scaling-fix

19ce35c

Conversation

sun-9545sunoj commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josemonsalve2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

josemonsalve2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josemonsalve2 commented Feb 13, 2026

Uh oh!

sun-9545sunoj commented Feb 13, 2026 via email

Uh oh!

josemonsalve2 commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants