mysql-test/suite/ndb_binlog/t/ndb_binlog_ddl_multi.test · mysql-cluster-7.4.27 · Rasoul Jahanshahi / Mysql Server

Mar 29, 2016

Bug#22842538 BINLOG SCHEMA DISTRIBUTION TIMEOUT AND FAILS WHEN ANOTHER MYSQL NODE START · dc0fce1d

Ole John Aske authored Mar 29, 2016

When another mysqld node is started, and joins (subscribe to)
the schema distribution protocol, another mysqld which is
waiting for a schema change to be distributed will timeout
during that wait. That happens as we incorrectly assumed
that the new arriving mysqld node would also 'ack' the
schema distribution. However, it arrived too late to be
a participant in it.

This patch fixes 3 issues all contributing to this failure:

a) There is a potential race between an 'inflight'
   subscribe event, and the start of a schema distribution.
   The subscribing node might or might not take part in the
   schema distribution, and its role is actually unknown at
   the point in time where the schema operation is started by
   the coordinator.
   The set of participating servers could only be determined
   when the Coordinator acks its own schema op: If the subscribe
   event arrived before it own schema up, then the subcribing
   node is a participant.
   This patch modifies the Coordinators ack to also modifying
   the acked slock_bitmap to clear the servers *not* participating.

b) check_wakeup_clients() called get_subcriber_bitmask() to get the
   current set of subscribers. However, 'self' was not included in the
   subscribers, which it always should be. Fixed this by letting
   Ndb_schema_dist_data::init() add 'own_nodeid' to subscribers.
   Furthermore, this enables us to clean up a couple of places
   where we used to add own_nodeid to the set retrieved from
   get_subscribers_bitmask().

c) handle_clear_slock() copied schema->slock into
   ndb_schema_object->slock_bitmap, thereby overwriting the intersect
   done as part of a). Changed the copy to do an intersect instead.

This patch also modifies several places where schema distribution
progress is printed:
- Always print more significant part of bitmask before the less significant.
- Adds some formating when printing the bitmasks.

Also removes a few clear of bitmasks immediately after an init,
which is redundant as ::init() also cleared it.

dc0fce1d

Bug#22842538 BINLOG SCHEMA DISTRIBUTION TIMEOUT AND FAILS WHEN ANOTHER MYSQL NODE START

Ole John Aske authored Mar 29, 2016

When another mysqld node is started, and joins (subscribe to)
the schema distribution protocol, another mysqld which is
waiting for a schema change to be distributed will timeout
during that wait. That happens as we incorrectly assumed
that the new arriving mysqld node would also 'ack' the
schema distribution. However, it arrived too late to be
a participant in it.

This patch fixes 3 issues all contributing to this failure:

a) There is a potential race between an 'inflight'
   subscribe event, and the start of a schema distribution.
   The subscribing node might or might not take part in the
   schema distribution, and its role is actually unknown at
   the point in time where the schema operation is started by
   the coordinator.
   The set of participating servers could only be determined
   when the Coordinator acks its own schema op: If the subscribe
   event arrived before it own schema up, then the subcribing
   node is a participant.
   This patch modifies the Coordinators ack to also modifying
   the acked slock_bitmap to clear the servers *not* participating.

b) check_wakeup_clients() called get_subcriber_bitmask() to get the
   current set of subscribers. However, 'self' was not included in the
   subscribers, which it always should be. Fixed this by letting
   Ndb_schema_dist_data::init() add 'own_nodeid' to subscribers.
   Furthermore, this enables us to clean up a couple of places
   where we used to add own_nodeid to the set retrieved from
   get_subscribers_bitmask().

c) handle_clear_slock() copied schema->slock into
   ndb_schema_object->slock_bitmap, thereby overwriting the intersect
   done as part of a). Changed the copy to do an intersect instead.

This patch also modifies several places where schema distribution
progress is printed:
- Always print more significant part of bitmask before the less significant.
- Adds some formating when printing the bitmasks.

Also removes a few clear of bitmasks immediately after an init,
which is redundant as ::init() also cleared it.