Skip to content
/ server Public

Comments

MDEV-37974 Avoid bogus deadlock in lock_rec_insert_check_and_lock()#4672

Open
arcivanov wants to merge 1 commit intoMariaDB:10.11from
arcivanov:MDEV-37974
Open

MDEV-37974 Avoid bogus deadlock in lock_rec_insert_check_and_lock()#4672
arcivanov wants to merge 1 commit intoMariaDB:10.11from
arcivanov:MDEV-37974

Conversation

@arcivanov
Copy link
Contributor

@arcivanov arcivanov commented Feb 20, 2026

When a transaction holds a granted lock on a record and another
transaction is waiting for that same record, an INSERT by the
lock-holding transaction into the gap before that record would
incorrectly enter lock_wait() on the waiting lock, creating a
false deadlock cycle.

In lock_rec_insert_check_and_lock(), after
lock_rec_other_has_conflicting() returns a conflicting lock,
check whether:

  1. the conflicting lock is WAITING,
  2. we hold a granted lock on the same record, and
  3. no other transaction holds a GRANTED lock that conflicts
    with our INSERT_INTENTION.

Only skip the lock wait when all three conditions are met. The scan
for granted conflicting locks is needed because lock inheritance
during purge can create granted GAP locks from other transactions
that coexist with our LOCK_ORDINARY but still block
INSERT_INTENTION.

The bogus deadlock in lock_delete_updated and versioning.update
(MDEV-14829 section) is also eliminated. In lock_delete_updated,
the DELETE no longer deadlocks but the row at the new PK position
is missed by the forward scan (pre-existing behavior). In
versioning.update, the concurrent UPDATE on a system-versioned
table no longer deadlocks because the historical row INSERT
correctly skips the waiting lock.

@arcivanov arcivanov force-pushed the MDEV-37974 branch 3 times, most recently from 23599ab to d9e12c5 Compare February 20, 2026 07:25
@arcivanov arcivanov marked this pull request as draft February 20, 2026 08:16
@arcivanov arcivanov force-pushed the MDEV-37974 branch 2 times, most recently from c04ea78 to df10682 Compare February 20, 2026 08:20
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 20, 2026
@arcivanov arcivanov force-pushed the MDEV-37974 branch 4 times, most recently from 081a437 to 0171d8d Compare February 22, 2026 02:21
When a transaction holds a granted lock on a record and another
transaction is waiting for that same record, an `INSERT` by the
lock-holding transaction into the gap before that record would
incorrectly enter `lock_wait()` on the waiting lock, creating a
false deadlock cycle.

In `lock_rec_insert_check_and_lock()`, after
`lock_rec_other_has_conflicting()` returns a conflicting lock,
check whether:

1. the conflicting lock is **WAITING**,
2. we hold a **granted** lock on the same record, and
3. no other transaction holds a **GRANTED** lock that conflicts
   with our `INSERT_INTENTION`.

Only skip the lock wait when all three conditions are met. The scan
for granted conflicting locks is needed because lock inheritance
during purge can create granted `GAP` locks from other transactions
that coexist with our `LOCK_ORDINARY` but still block
`INSERT_INTENTION`.

The bogus deadlock in `lock_delete_updated` and `versioning.update`
(MDEV-14829 section) is also eliminated. In `lock_delete_updated`,
the `DELETE` no longer deadlocks but the row at the new PK position
is missed by the forward scan (pre-existing behavior). In
`versioning.update`, the concurrent `UPDATE` on a system-versioned
table no longer deadlocks because the historical row `INSERT`
correctly skips the waiting lock.
@arcivanov arcivanov marked this pull request as ready for review February 22, 2026 05:02
@arcivanov
Copy link
Contributor Author

@dr-m

Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. This is a preliminary review. Please stand by for the final review.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is interesting. But I think that we should be very careful here. We have been bitten by MDEV-27025 in the past. I think that we would need some additional testing in the style of https://mariadb.com/resources/blog/isolation-level-violation-testing-and-debugging-in-mariadb/ because our regular stress testing does not cover consistency or isolation very well.

Comment on lines 6 to 8
--disable_query_log
call mtr.add_suppression("InnoDB: Transaction was aborted due to ");
--enable_query_log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any transaction being aborted anymore?

Comment on lines +1 to +2
--innodb-deadlock-detect=OFF
--innodb-lock-wait-timeout=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why such a long timeout? Could we run two combinations of this test, for both values of innodb_deadlock_detect?

Comment on lines +75 to +78
--connection con1
--disconnect con1

--connection default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two --connection lines are redundant and should be removed.

Comment on lines +5766 to +5770
DBUG_LOG("ib_lock",
"insert_check trx " << ib::hex(trx->id)
<< " index " << index->name()
<< " page " << id
<< " heap_no " << heap_no);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think that such tracing is beneficial (I am too used to https://rr-project.org nowadays), I’d suggest to use DBUG_PRINT instead, to reduce the code footprint.

err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame,
heap_no, index, thr, nullptr);
trx->mutex_unlock();
lock_t *blocker= c_lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need a copy of this variable; we can just assign c_lock= nullptr; when there is no actual conflicting lock.

Comment on lines +5783 to +5817
if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
g.cell(), id,
heap_no, trx))
{
trx->mutex_lock();
err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame,
heap_no, index, thr, nullptr);
trx->mutex_unlock();
lock_t *blocker= c_lock;

/* MDEV-37974: If the first conflicting lock is WAITING and
we hold a granted lock on the successor record, the waiting
lock is necessarily blocked behind our lock in the queue
(directly or via queue ordering) and can never be granted
while our lock exists.

However, we must also verify that no other transaction holds
a GRANTED lock that conflicts with our INSERT_INTENTION.
Such locks can arise from lock inheritance during purge
(e.g., an inherited X GAP lock that coexists with our
LOCK_ORDINARY but still blocks INSERT_INTENTION). Only when
all conflicting locks from other transactions are WAITING
can we safely skip the lock wait. */
if (c_lock->is_waiting() &&
lock_rec_has_expl(LOCK_X | LOCK_REC_NOT_GAP,
g.cell(), id, heap_no, trx))
{
const bool is_supremum=
(heap_no == PAGE_HEAP_NO_SUPREMUM);
blocker= nullptr;
for (lock_t *l= lock_sys_t::get_first(g.cell(), id,
heap_no);
l; l= lock_rec_get_next(heap_no, l))
{
if (l->trx != trx && !l->is_waiting() &&
lock_rec_has_to_wait(trx, type_mode, l,
is_supremum))
{
blocker= l;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not refer to MDEV tickets in source code comments, unless it is about some new feature or an open bug that is not expected to be fixed soon. Likewise, I would use descriptive test case names, instead of using a ticket number as a test case name.

Do we really need the for loop? The function lock_move_granted_locks_to_front() suggests that any conflicting waiting lock requests must after any non-waiting requests in the hash bucket chain. Hence, if the first conflicting lock that we find is a waiting request rather than a granted lock, we should already know that there is no actual conflict.

Basically, I am wondering if the following logic would be sufficient:

      if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                         g.cell(), id,
                                                         heap_no, trx))
        if (!c_lock->is_waiting()
        {
          trx->mutex_lock();
          err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame,
                                        heap_no, index, thr, nullptr);
          trx->mutex_unlock();
        }

Note: I did not consider different type_mode yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

3 participants