MDEV-37974 Avoid bogus deadlock in lock_rec_insert_check_and_lock()#4672
MDEV-37974 Avoid bogus deadlock in lock_rec_insert_check_and_lock()#4672arcivanov wants to merge 1 commit intoMariaDB:10.11from
Conversation
23599ab to
d9e12c5
Compare
c04ea78 to
df10682
Compare
081a437 to
0171d8d
Compare
When a transaction holds a granted lock on a record and another transaction is waiting for that same record, an `INSERT` by the lock-holding transaction into the gap before that record would incorrectly enter `lock_wait()` on the waiting lock, creating a false deadlock cycle. In `lock_rec_insert_check_and_lock()`, after `lock_rec_other_has_conflicting()` returns a conflicting lock, check whether: 1. the conflicting lock is **WAITING**, 2. we hold a **granted** lock on the same record, and 3. no other transaction holds a **GRANTED** lock that conflicts with our `INSERT_INTENTION`. Only skip the lock wait when all three conditions are met. The scan for granted conflicting locks is needed because lock inheritance during purge can create granted `GAP` locks from other transactions that coexist with our `LOCK_ORDINARY` but still block `INSERT_INTENTION`. The bogus deadlock in `lock_delete_updated` and `versioning.update` (MDEV-14829 section) is also eliminated. In `lock_delete_updated`, the `DELETE` no longer deadlocks but the row at the new PK position is missed by the forward scan (pre-existing behavior). In `versioning.update`, the concurrent `UPDATE` on a system-versioned table no longer deadlocks because the historical row `INSERT` correctly skips the waiting lock.
0171d8d to
a99f1e0
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution. This is a preliminary review. Please stand by for the final review.
dr-m
left a comment
There was a problem hiding this comment.
Thank you, this is interesting. But I think that we should be very careful here. We have been bitten by MDEV-27025 in the past. I think that we would need some additional testing in the style of https://mariadb.com/resources/blog/isolation-level-violation-testing-and-debugging-in-mariadb/ because our regular stress testing does not cover consistency or isolation very well.
| --disable_query_log | ||
| call mtr.add_suppression("InnoDB: Transaction was aborted due to "); | ||
| --enable_query_log |
There was a problem hiding this comment.
Is any transaction being aborted anymore?
| --innodb-deadlock-detect=OFF | ||
| --innodb-lock-wait-timeout=3 |
There was a problem hiding this comment.
Why such a long timeout? Could we run two combinations of this test, for both values of innodb_deadlock_detect?
| --connection con1 | ||
| --disconnect con1 | ||
|
|
||
| --connection default |
There was a problem hiding this comment.
The two --connection lines are redundant and should be removed.
| DBUG_LOG("ib_lock", | ||
| "insert_check trx " << ib::hex(trx->id) | ||
| << " index " << index->name() | ||
| << " page " << id | ||
| << " heap_no " << heap_no); |
There was a problem hiding this comment.
If you think that such tracing is beneficial (I am too used to https://rr-project.org nowadays), I’d suggest to use DBUG_PRINT instead, to reduce the code footprint.
| err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame, | ||
| heap_no, index, thr, nullptr); | ||
| trx->mutex_unlock(); | ||
| lock_t *blocker= c_lock; |
There was a problem hiding this comment.
We don’t need a copy of this variable; we can just assign c_lock= nullptr; when there is no actual conflicting lock.
| if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode, | ||
| g.cell(), id, | ||
| heap_no, trx)) | ||
| { | ||
| trx->mutex_lock(); | ||
| err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame, | ||
| heap_no, index, thr, nullptr); | ||
| trx->mutex_unlock(); | ||
| lock_t *blocker= c_lock; | ||
|
|
||
| /* MDEV-37974: If the first conflicting lock is WAITING and | ||
| we hold a granted lock on the successor record, the waiting | ||
| lock is necessarily blocked behind our lock in the queue | ||
| (directly or via queue ordering) and can never be granted | ||
| while our lock exists. | ||
|
|
||
| However, we must also verify that no other transaction holds | ||
| a GRANTED lock that conflicts with our INSERT_INTENTION. | ||
| Such locks can arise from lock inheritance during purge | ||
| (e.g., an inherited X GAP lock that coexists with our | ||
| LOCK_ORDINARY but still blocks INSERT_INTENTION). Only when | ||
| all conflicting locks from other transactions are WAITING | ||
| can we safely skip the lock wait. */ | ||
| if (c_lock->is_waiting() && | ||
| lock_rec_has_expl(LOCK_X | LOCK_REC_NOT_GAP, | ||
| g.cell(), id, heap_no, trx)) | ||
| { | ||
| const bool is_supremum= | ||
| (heap_no == PAGE_HEAP_NO_SUPREMUM); | ||
| blocker= nullptr; | ||
| for (lock_t *l= lock_sys_t::get_first(g.cell(), id, | ||
| heap_no); | ||
| l; l= lock_rec_get_next(heap_no, l)) | ||
| { | ||
| if (l->trx != trx && !l->is_waiting() && | ||
| lock_rec_has_to_wait(trx, type_mode, l, | ||
| is_supremum)) | ||
| { | ||
| blocker= l; |
There was a problem hiding this comment.
I would not refer to MDEV tickets in source code comments, unless it is about some new feature or an open bug that is not expected to be fixed soon. Likewise, I would use descriptive test case names, instead of using a ticket number as a test case name.
Do we really need the for loop? The function lock_move_granted_locks_to_front() suggests that any conflicting waiting lock requests must after any non-waiting requests in the hash bucket chain. Hence, if the first conflicting lock that we find is a waiting request rather than a granted lock, we should already know that there is no actual conflict.
Basically, I am wondering if the following logic would be sufficient:
if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
g.cell(), id,
heap_no, trx))
if (!c_lock->is_waiting()
{
trx->mutex_lock();
err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->page.frame,
heap_no, index, thr, nullptr);
trx->mutex_unlock();
}Note: I did not consider different type_mode yet.
When a transaction holds a granted lock on a record and another
transaction is waiting for that same record, an
INSERTby thelock-holding transaction into the gap before that record would
incorrectly enter
lock_wait()on the waiting lock, creating afalse deadlock cycle.
In
lock_rec_insert_check_and_lock(), afterlock_rec_other_has_conflicting()returns a conflicting lock,check whether:
with our
INSERT_INTENTION.Only skip the lock wait when all three conditions are met. The scan
for granted conflicting locks is needed because lock inheritance
during purge can create granted
GAPlocks from other transactionsthat coexist with our
LOCK_ORDINARYbut still blockINSERT_INTENTION.The bogus deadlock in
lock_delete_updatedandversioning.update(MDEV-14829 section) is also eliminated. In
lock_delete_updated,the
DELETEno longer deadlocks but the row at the new PK positionis missed by the forward scan (pre-existing behavior). In
versioning.update, the concurrentUPDATEon a system-versionedtable no longer deadlocks because the historical row
INSERTcorrectly skips the waiting lock.