nccl_ep: Low-Latency kernel memory footpring optimization by artpol84 · Pull Request #2040 · NVIDIA/nccl

artpol84 · 2026-03-09T22:37:12Z

Description

Reduce LL kernels memory consumption by including the top-k indices into the token message payloads.

On dispatch, this allows to avoid maintaining a separate buffer space per local-expert/remote-rank pair and instead have one space for each remote rank.
This reduces the memory overhead from O(E x B x H) down to O(N x B x H) where E - number of experts, N - number of ranks, B - batch size, and H - token hidden dimension

On combine, the top-k indices are used to reduce the communication buffer from O(E x B x H) to O(K x B x H).

Related Issues

N/A

Changes & Impact

Changes: Reorganize NCCL EP communication buffer layout.
Impact: Order of magnitude reduction in memory consumption.

Performance Impact

No impact observed

Reduce LL kernels memory consumption by including the top-k indices into the token message payloads. On dispatch, this allows to avoid maintaining a separate buffer space per local-expert/remote-rank pair and instead have one space for each remote rank. This reduces the memory overhead from O(E x B x H) down to O(N x B x H) where E - number of experts, N - number of ranks, B - batch size, and H - token hidden dimension On combine, the top-k indices are used to reduce the communication buffer from O(E x B x H) to O(K x B x H). Signed-off-by: Artem Y. Polyakov <artemp@nvidia.com>

xiaofanl-nvidia · 2026-03-09T23:27:51Z

@jskrobola can you help start the mirror?

artpol84 · 2026-03-12T16:38:55Z

@xiaofanl-nvidia, @sb17v

Update: This change was tested with:

ep_bench microbenchmark, and
internal vLLM integration

showing no performance degradation compared to pre-optimization code

sb17v assigned artpol84 Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nccl_ep: Low-Latency kernel memory footpring optimization#2040

nccl_ep: Low-Latency kernel memory footpring optimization#2040
artpol84 wants to merge 1 commit intoNVIDIA:masterfrom
artpol84:topic/ncclEP/ll_mem_opt_v2

artpol84 commented Mar 9, 2026

Uh oh!

xiaofanl-nvidia commented Mar 9, 2026

Uh oh!

artpol84 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

artpol84 commented Mar 9, 2026

Description

Related Issues

Changes & Impact

Performance Impact

Uh oh!

xiaofanl-nvidia commented Mar 9, 2026

Uh oh!

artpol84 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants