MDEV-28213 Skip ignored domain IDs during GTID validation#4677
MDEV-28213 Skip ignored domain IDs during GTID validation#4677bodyhedia44 wants to merge 1 commit intoMariaDB:10.6from
Conversation
988af46 to
155d0fe
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution. This is a preliminary review.
gkodinov
left a comment
There was a problem hiding this comment.
Please do not do multiple commits. Please stick to a single commit and amend it.
sql/slave.cc
Outdated
| sprintf(err_buff, "%s Error: Out of memory", errmsg); | ||
| goto err; | ||
| } | ||
| for (uint i= 0; i < do_ids->elements; i++) |
There was a problem hiding this comment.
A good quantity of repeating code here. I'd consider making a helper function and passing down the list and the name to print as arguments.
sql/sql_repl.cc
Outdated
| bool expect_number= true; | ||
|
|
||
| /* Skip leading whitespace */ | ||
| while (p < end && *p == ' ') |
There was a problem hiding this comment.
do you really need to skip leading space twice?
sql/sql_repl.cc
Outdated
|
|
||
| while (p < end) | ||
| { | ||
| char *endptr; |
There was a problem hiding this comment.
I would move this inside if(expect_number).
sql/sql_repl.cc
Outdated
| while (p < end) | ||
| { | ||
| char *endptr; | ||
| ulong domain_id; |
sql/sql_repl.cc
Outdated
| @retval 0 success | ||
| @retval 1 error | ||
| */ | ||
| static int |
There was a problem hiding this comment.
any specific reason why you're not returning a bool?
sql/sql_repl.cc
Outdated
| const DYNAMIC_ARRAY *do_ids, ulong domain_id) | ||
| { | ||
| /* If IGNORE_DOMAIN_IDS is set, check if this domain is in it */ | ||
| for (uint32 i= 0; i < ignore_ids->elements; i++) |
There was a problem hiding this comment.
any specific reason why you're not sorting this array and then using bsearch?
sql/sql_repl.cc
Outdated
| */ | ||
| if (do_ids->elements > 0) | ||
| { | ||
| for (uint32 i= 0; i < do_ids->elements; i++) |
There was a problem hiding this comment.
ditto for this one: please sort and then use bsearch.
When a slave is configured with IGNORE_DOMAIN_IDS or DO_DOMAIN_IDS, the master's binlog dump thread should skip GTID state validation for those filtered domains. This avoids false ER_GTID_POSITION_NOT_FOUND errors when the slave does not have (or need) the current GTID state for domains it is filtering. The slave now sends its IGNORE/DO domain ID lists to the master via user variables @slave_connect_state_domain_ids_ignore and @slave_connect_state_domain_ids_do, which the master reads in mysql_binlog_send() and passes to check_slave_start_position(). Changes: - sql/sql_repl.cc: load_ignore_domain_ids() returns bool, fix parser to avoid redundant whitespace skip and scope local variables tightly. Add ulong_cmp() comparator. Replace O(n) linear scans in is_domain_id_ignored() with bsearch() after sorting the arrays. - sql/slave.cc: Add build_domain_ids_query() helper to construct SET queries for domain ID user variables. Refactor duplicate code into a loop using a struct array. - mysql-test/suite/rpl/t/rpl_gtid_ignored_domain_ids_validation.test: New test validating end-to-end GTID replication with domain filtering.
8f36cde to
2f4d520
Compare
|
done |
| Send the slave's IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS to the master, | ||
| so it can skip GTID state validation for domains the slave doesn't | ||
| care about. See MDEV-28213. |
There was a problem hiding this comment.
Why on the master side?
Let’s see – if the ignored domains are not provided in the @@gtid_slave_pos, then the master will think the slave wants to replicate those domains from the beginning, even though the domain will end up ignored, regardless of where the master starts.
So this problem is really overlapping with (but not necessarily entirely part of) MDEV-9345 filtering on master, #4086.
When a slave connects to a master using MASTER_USE_GTID=Slave_Pos and the
master has purged old binlogs, the master validates the slave's GTID state
against the oldest available binlog's Gtid_list event. If the Gtid_list
references domains that the slave is configured to ignore (via
CHANGE MASTER IGNORE_DOMAIN_IDS or DO_DOMAIN_IDS), validation incorrectly
fails with error 1236:
"Could not find GTID state requested by slave in any binlog files.
Probably the slave state is too old and required binlog files have
been purged."
This is a false rejection -- the slave does not need events from those domains.
Fix: the slave now sends its IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS to the master
as user variables (@slave_connect_state_domain_ids_ignore and
@slave_connect_state_domain_ids_do) before COM_BINLOG_DUMP. The master reads
these and skips validation for ignored domains in three code paths:
searching for the right binlog file
does not care about
This is backwards compatible: older masters store the unknown user variables
harmlessly, and older slaves simply do not send them.
Includes MTR test rpl.rpl_gtid_ignored_domain_ids_validation covering both
IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS scenarios with purged binlogs.