MDEV-32371 Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict()#2866
Merged
Conversation
|
|
Contributor
Author
|
While we do have However, if a concurrent read of the same block is in progress, the changed code path should be exercised. To test this, a tiny |
Thirunarayanan
approved these changes
Nov 28, 2023
…ted_evict() buf_page_get_zip(): Do not wait for the page latch while holding hash_lock. If the latch is not available, ensure that any concurrent buf_pool_t::corrupted_evict() will be able to acquire the hash_lock, and then retry the lookup. If the page was corrupted, we will finally "goto must_read_page", retry the read once more, and then report an error. Reviewed by: Thirunarayanan Balathandayuthapani
963a88b to
bb511de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
buf_page_get_zip(): Do not wait for the page latch while holdinghash_lock. If the latch is not available, ensure that any concurrentbuf_pool_t::corrupted_evict()will be able to acquire thehash_lock, and then retry the lookup. If the page was corrupted and evicted, we will finallygoto must_read_page, retry the read once more, and then report an error.How can this PR be tested?
I think that this is best tested together with MDEV-31817 #2865, which is included for the purpose of testing. The workload must use
ROW_FORMAT=COMPRESSEDtables, and we might want to useCMAKE_BUILD_TYPE=RelWithDebInfo.Basing the PR against the correct MariaDB version
PR quality check