Skip to content

gh-144763: Don't detach the GIL in tracemalloc#144779

Open
vstinner wants to merge 2 commits intopython:mainfrom
vstinner:tracemalloc_dont_detach
Open

gh-144763: Don't detach the GIL in tracemalloc#144779
vstinner wants to merge 2 commits intopython:mainfrom
vstinner:tracemalloc_dont_detach

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Feb 13, 2026

tracemalloc no longer detachs the GIL to acquire its internal lock.

tracemalloc no longer detachs the GIL to acquire its internal lock.
@vstinner
Copy link
Member Author

This change basically reverts commit 01157e0 (PR gh-139449) which fixed another deadlock on tracemalloc.stop() :-) See issue gh-139116.

cc @ZeroIntensity @encukou

Don't call PyRefTracer_SetTracer() while holding TABLES_LOCK().
@vstinner
Copy link
Member Author

This change basically reverts commit 01157e0 (PR #139449) which fixed another deadlock on tracemalloc.stop() :-) See issue #139116.

I pushed a change to avoid the deadlock on tracemalloc.stop(): don't call PyRefTracer_SetTracer() while holding TABLES_LOCK().

@vstinner vstinner marked this pull request as ready for review February 13, 2026 16:20
int lineno = -1;
PyCodeObject *code = _PyFrame_GetCode(pyframe);
// PyUnstable_InterpreterFrame_GetLine() cannot but used, since it uses
// a critical section which can trigger a deadlock.
Copy link
Contributor

@kumaraditya303 kumaraditya303 Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that critical sections requires an active thread state and this code can be called with detached thread state iirc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracemalloc_get_frame() is called with an attached thread state, see the caller traceback_get_frames() which has the code:

    PyThreadState *tstate = _PyThreadState_GET();
    assert(tstate != NULL);

Example of deadlock when running #144763 (comment) reproducer on a free-threaded build:

  • Thread A: _PyCode_GetTLBC() => ... => tracemalloc_alloc(): TABLES_LOCK()
  • Thread B: _PyTraceMalloc_TraceRef() => ... => tracemalloc_get_frame() => PyUnstable_InterpreterFrame_GetLine() => PyCode_Addr2Line(): Py_BEGIN_CRITICAL_SECTION(co)

Locks:

  • Thread A is in a Py_BEGIN_CRITICAL_SECTION(co) critical section (_PyCode_GetTLBC()) and waits for TABLES_LOCK().
  • Thread B has TABLES_LOCK() lock and waits for Py_BEGIN_CRITICAL_SECTION(co) critical section.

Thread A and thread B want to use Py_BEGIN_CRITICAL_SECTION(co) on the same code object (0x20000844810).

=> deadlock :-(

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to keep the attached thread state in TABLES_LOCK? It seems the bulk of the fix is just calling PyRefTracer_SetTracer without holding the lock.

Alternatively, we could just use Py_BEGIN_CRITICAL_SECTION_MUTEX instead of PyMutex_Lock.

Comment on lines +1 to +2
Fix a race condition in :mod:`tracemalloc`: it no longer detachs the GIL to
acquire its internal lock. Patch by Victor Stinner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use "attached thread state" terminology here, since this only affects free-threaded builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants