Skip to content

Conversation

@jkotas
Copy link
Member

@jkotas jkotas commented Jan 28, 2026

Backport of #123696 to release/11.0-preview1

/cc @janvorli

Customer Impact

  • Customer reported
  • Found internally

Very frequent intermittent runtime crashes caused by buggy CPU context manipulations.

Regression

  • Yes
  • No

Regression introduced by #123307, merged on Thu Jan 22, 2026.

Testing

Default CI run.

Risk

Low, clean revert of the offending PR.

Copilot AI review requested due to automatic review settings January 28, 2026 14:08
@jkotas
Copy link
Member Author

jkotas commented Jan 28, 2026

Backport of #123696 to release/11.0-preview1

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member Author

jkotas commented Jan 28, 2026

@akoeplinger Is it fine to submit this via dotnet/runtime or does it need to be submitted via dotnet/dotnet directly?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reverts commit 18acc2d8f708b72420de1410c954d2984312faa2 (“Bring a few jithelpers to new unwind plan”), restoring prior unwind/context-handling behavior for several JIT helpers and architectures.

Changes:

  • Removes ClrRestoreNonvolatileContextWorker implementations and related asm constants for ARM64/LoongArch64/RISC-V64.
  • Reworks OSR patchpoint transition context setup to use captured/unwound context rather than building a CONTEXT from the TransitionBlock.
  • Simplifies marked-JIT-helper detection/unwind path and removes now-unused prolog/epilog helper macros.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/threads.cpp Limits ClrRestoreNonvolatileContextWorker usage to AMD64 and falls back to RtlRestoreContext elsewhere.
src/coreclr/vm/riscv64/asmhelpers.S Removes RISC-V64 ClrRestoreNonvolatileContextWorker and tweaks patchpoint comment.
src/coreclr/vm/riscv64/asmconstants.h Removes CONTEXT offset/flag constants no longer needed after worker removal.
src/coreclr/vm/loongarch64/asmhelpers.S Removes LoongArch64 ClrRestoreNonvolatileContextWorker and tweaks patchpoint comment.
src/coreclr/vm/loongarch64/asmconstants.h Removes CONTEXT offset/flag constants no longer needed after worker removal.
src/coreclr/vm/jithelpers.cpp Changes OSR patchpoint transition to capture/unwind context (instead of TransitionBlock-based construction).
src/coreclr/vm/frames.h Removes SoftwareExceptionFrame::UpdateContextForOSRTransition declaration.
src/coreclr/vm/exceptmacros.h Adds UnwindAndContinueResumeAfterCatch declaration for interpreter builds.
src/coreclr/vm/excep.cpp Refactors marked-JIT-helper identification and unwind logic; adds UnwindAndContinueResumeAfterCatch implementation.
src/coreclr/vm/arm64/asmmacros.h Removes ARM64 epilog macro that restored FP callee-saved regs and returned.
src/coreclr/vm/arm64/asmhelpers.asm Removes ARM64 ClrRestoreNonvolatileContextWorker; patchpoint comment tweak.
src/coreclr/vm/arm64/asmhelpers.S Removes ARM64 ClrRestoreNonvolatileContextWorker; patchpoint comment tweak.
src/coreclr/vm/arm64/asmconstants.h Removes CONTEXT offset/flag constants no longer needed after worker removal.
src/coreclr/vm/amd64/AsmMacros.inc Adjusts PUSH_COOP_PINVOKE_FRAME_WITH_FLOATS layout and removes its “return” epilog macro.
src/coreclr/vm/amd64/AsmHelpers.asm Switches JIT_Patchpoint to PROLOG_WITH_TRANSITION_BLOCK / EPILOG_WITH_TRANSITION_BLOCK_RETURN.
src/coreclr/pal/inc/unixasmmacrosriscv64.inc Removes POP_COOP_PINVOKE_FRAME_WITH_FLOATS_RETURN macro.
src/coreclr/pal/inc/unixasmmacrosloongarch64.inc Removes POP_COOP_PINVOKE_FRAME_WITH_FLOATS_RETURN macro.
src/coreclr/pal/inc/unixasmmacrosarm64.inc Removes POP_COOP_PINVOKE_FRAME_WITH_FLOATS_RETURN macro.

pFrameContext->Rbp = currentFP;
#endif // TARGET_AMD64

SetSP(pFrameContext, currentSP);
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this OSR transition setup, the context is unwound with RtlVirtualUnwind and then SP is overwritten back to the original method’s SP. On non-AMD64 targets, FP is captured (currentFP) but never restored, so after RtlVirtualUnwind the FP register in pFrameContext will still reflect the caller frame while SP is set to the callee frame. This mismatch can break stack walking and any FP-based addressing in the OSR method. Consider restoring FP for the non-AMD64 targets as well (via SetFP/GetFP helpers), or remove the FP capture/comment if FP is intentionally not part of the transition contract.

Suggested change
SetSP(pFrameContext, currentSP);
SetSP(pFrameContext, currentSP);
#if !defined(TARGET_AMD64)
SetFP(pFrameContext, currentFP);
#endif

Copilot uses AI. Check for mistakes.
#if defined(TARGET_AMD64) && defined(_DEBUG)
CHECK_WRITE_BARRIER_RANGE(JIT_WriteBarrier_Debug)
CHECK_RANGE(JIT_WriteBarrier_Debug)
#endif
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK_RANGE is defined as a preprocessor macro inside IsIPInMarkedJitHelper but never undefined. This leaks a very generic macro name into the rest of the translation unit and can cause accidental collisions later in the file or with future includes/changes. Please #undef CHECK_RANGE after the last use (mirroring the prior #undef CHECK_WRITE_BARRIER_RANGE pattern).

Suggested change
#endif
#endif
#undef CHECK_RANGE

Copilot uses AI. Check for mistakes.

#if defined(TARGET_AMD64)
// If calls push the return address, we need to simulate that here, so the OSR
// method sees the "expected" SP misalgnment on entry.
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo in comment: “misalgnment” -> “misalignment”.

Suggested change
// method sees the "expected" SP misalgnment on entry.
// method sees the "expected" SP misalignment on entry.

Copilot uses AI. Check for mistakes.
@akoeplinger
Copy link
Member

@jkotas it's fine to do it in runtime right now

@jkotas jkotas added the Servicing-approved Approved for servicing release label Jan 28, 2026
@jkotas
Copy link
Member Author

jkotas commented Jan 28, 2026

/ba-g infrastructure timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-VM-coreclr Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants