Skip to content

Conversation

@am11
Copy link
Member

@am11 am11 commented Jan 26, 2026

On ARM64/LoongArch64/RISC-V64, OSR transitions were reading callee-saved register values from the TransitionBlock, which contains the register values at the moment JIT_Patchpoint was called. Under register stress (or normal optimization), Tier0 may reuse callee-saved registers as temporaries, so these values can be garbage rather than the preserved values the caller expects. This fix reads callee-saves from Tier0's stack save area using offset information recorded in PatchpointInfo during Tier0 compilation; the same location where Tier0 saved the caller's original values in its prolog. x64 doesn't need this because its JIT epilog explicitly pops Tier0's callee-saves from the stack, but bringing it to the same plan for consistency.

Closes #123608
Closes #123605

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 26, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 26, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/jit-contrib
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member

Can you point out an example where this happens? OSR isn't supported in optimized code, and in unoptimized code all the live state at a patchpoint should be on the stack.

Sounds like register stress is doing something wrong.

@am11
Copy link
Member Author

am11 commented Jan 26, 2026

The issue isn't about OSR's live state, but about caller's callee-saved registers that Tier0 must preserve.

Scenario: Test() calls Problem() with x19/x20 holding values. Tier0 saves these in prolog, then under JitStressRegs reuses x19/x20 as scratch. When patchpoint hits, TransitionBlock captures the current register values (scratch), not the saved originals. OSR returns → Test() sees corrupted registers.

Repro: JIT/opt/OSR/Runtime_89666 with DOTNET_JitStressRegs=3 on ARM64

x64 doesn't hit this because its OSR epilog explicitly pops Tier0's callee-saves from stack. Register stress is behaving correctly; the fix reads callee-saves from Tier0's stack save area instead of TransitionBlock.

@AndyAyersMS
Copy link
Member

Ah, ok.

We should move these other archs to the same plan as x64 (see #33658), but that is likely easier said than done. But perhaps some of the things that made it hard before have since been fixed.

I went looking for non-stressregs failures in this weekend's runs. There is an x64 failure here, though given the above it is likely something different.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=1266373&view=ms.vss-test-web.build-test-results-tab&runId=35337758&paneView=debug

net11.0-windows-Release-x64-jitosr_stress_random

DOTNET_JitRandomOnStackReplacement=15
DOTNET_OSR_HitLimit=2
DOTNET_TC_OnStackReplacement=1
DOTNET_TC_OnStackReplacement_InitialCounter=1
DOTNET_TC_QuickJitForLoops=1
DOTNET_TieredCompilation=1

  Discovering: System.Net.Http.WinHttpHandler.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Http.WinHttpHandler.Functional.Tests (found 368 of 555 test cases)
  Starting:    System.Net.Http.WinHttpHandler.Functional.Tests (parallel test collections = on [4 threads], stop on fail = of
....
  System.Net.Http.Functional.Tests.PlatformHandler_HttpClientHandler_ClientCertificates_Http2_Test.Manual_CertificateOnlySentWhenValid_Success(certIndex: 3, serverExpectsClientCertificate: False) [SKIP]
      https://github.com/dotnet/runtime/issues/69238
Fatal error.
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Http.Functional.Tests.HttpClientHandler_Proxy_Test.ProxyTunnelRequest_MaxConnectionsSetButDoesNotApplyToProxyConnect_Success()
Repeated 2 times:
--------------------------------
   at System.RuntimeMethodHandle.InvokeMethod(System.Runtime.CompilerServices.ObjectHandleOnStack, Void**, System.Runtime.CompilerServices.ObjectHandleOnStack, BOOL, System.Runtime.CompilerServices.ObjectHandleOnStack)
--------------------------------
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(System.Object, IntPtr*)
   at System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)

@am11
Copy link
Member Author

am11 commented Jan 27, 2026

NativeAOT size regression failure is #123667 (unrelated; failing since 19+ hours in main).

@AndyAyersMS, @jakobbotsch, could you please run /azp run runtime-coreclr jitstressregs and /azp run runtime-coreclr jitstress2-jitstressregs pipelines against this PR?


@AndyAyersMS, I tried running a few libs tests with

env:DOTNET_TC_OnStackReplacement_InitialCounter=1;
$env:DOTNET_TC_OnStackReplacement=1;
$env:DOTNET_OSR_HitLimit=2;
$env:DOTNET_JitRandomOnStackReplacement=15;
$env:DOTNET_TC_OnStackReplacement=1;
$env:DOTNET_TC_QuickJitForLoops=1;
$env:DOTNET_TieredCompilation=1;

on win-x64, but couldn't make it fail locally. We can open a separate issue to investigate PGO pipeline issue, I will take a look after this stress test thing is done. Having a corerun'able repro would help, I will run a few src/tests with these flags on before/after once we get there.

We should move these other archs to the same plan as x64 (see #33658), but that is likely easier said than done. But perhaps some of the things that made it hard before have since been fixed.

This PR is bringing others partially to x64 plan (JIT side is now same as it populates the patchpointinfo, helper deviates at where to reads info from TransitionBlock on x64 for both modified and unmodified registers and ppInfo on arm64 etc. for modified registers), the remaining partial difference could be resolved by putting x64 to the new plan, but either way, as you said, it's a bit tricky to consolidate. :)

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS
Copy link
Member

Do we have a theory on the x64 failures? @jakobbotsch and I were chatting and thought perhaps x64 was insulated... but it appears not.

@am11
Copy link
Member Author

am11 commented Jan 27, 2026

I just pushed a commit bringing to the same plan, lets see. I wasn't able to repro one reported issue #123671 (comment) with and without CET, but @janvorli has a private repro with diagnostics tool.

@am11 am11 force-pushed the feature/deterministic-unwinding branch from d197992 to 3ed158b Compare January 28, 2026 08:01
@am11 am11 changed the title Use patchpointinfo for OSR on non-xarch ISAs Use patchpointinfo for OSR runtime helper Jan 28, 2026
@jakobbotsch
Copy link
Member

This PR should simplify my job for #120865 as well.

@am11
Copy link
Member Author

am11 commented Jan 31, 2026

@jakobbotsch, patchpointinfo approach is now looking a bit promising (failures are unrelated; timeout / missing logs etc.), lets see what /azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs thinks. :)

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr libraries-pgo, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs, runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@am11
Copy link
Member Author

am11 commented Jan 31, 2026

JIT stress test failures and PGO are matching those of main (I looked through today's runs of those pipelines on main). Please let me know if you find a new relevant failure.

@janvorli, could you please run internal diagnostic tool against this branch which was failing before the revert? It is now rewritten using patchpointinfo for callee-saved regs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure: JIT/opt/OSR/Runtime_89666/Runtime_89666.cmd Test failure: JIT/opt/OSR/Runtime_69032/Runtime_69032.cmd

4 participants