Skip to content
  • Ole John Aske's avatar
    dc0fce1d
    Bug#22842538 BINLOG SCHEMA DISTRIBUTION TIMEOUT AND FAILS WHEN ANOTHER MYSQL NODE START · dc0fce1d
    Ole John Aske authored
    When another mysqld node is started, and joins (subscribe to)
    the schema distribution protocol, another mysqld which is
    waiting for a schema change to be distributed will timeout
    during that wait. That happens as we incorrectly assumed
    that the new arriving mysqld node would also 'ack' the
    schema distribution. However, it arrived too late to be
    a participant in it.
    
    This patch fixes 3 issues all contributing to this failure:
    
    a) There is a potential race between an 'inflight'
       subscribe event, and the start of a schema distribution.
       The subscribing node might or might not take part in the
       schema distribution, and its role is actually unknown at
       the point in time where the schema operation is started by
       the coordinator.
       The set of participating servers could only be determined
       when the Coordinator acks its own schema op: If the subscribe
       event arrived before it own schema up, then the subcribing
       node is a participant.
       This patch modifies the Coordinators ack to also modifying
       the acked slock_bitmap to clear the servers *not* participating.
    
    b) check_wakeup_clients() called get_subcriber_bitmask() to get the
       current set of subscribers. However, 'self' was not included in the
       subscribers, which it always should be. Fixed this by letting
       Ndb_schema_dist_data::init() add 'own_nodeid' to subscribers.
       Furthermore, this enables us to clean up a couple of places
       where we used to add own_nodeid to the set retrieved from
       get_subscribers_bitmask().
    
    c) handle_clear_slock() copied schema->slock into
       ndb_schema_object->slock_bitmap, thereby overwriting the intersect
       done as part of a). Changed the copy to do an intersect instead.
    
    This patch also modifies several places where schema distribution
    progress is printed:
    - Always print more significant part of bitmask before the less significant.
    - Adds some formating when printing the bitmasks.
    
    Also removes a few clear of bitmasks immediately after an init,
    which is redundant as ::init() also cleared it.
    dc0fce1d
    Bug#22842538 BINLOG SCHEMA DISTRIBUTION TIMEOUT AND FAILS WHEN ANOTHER MYSQL NODE START
    Ole John Aske authored
    When another mysqld node is started, and joins (subscribe to)
    the schema distribution protocol, another mysqld which is
    waiting for a schema change to be distributed will timeout
    during that wait. That happens as we incorrectly assumed
    that the new arriving mysqld node would also 'ack' the
    schema distribution. However, it arrived too late to be
    a participant in it.
    
    This patch fixes 3 issues all contributing to this failure:
    
    a) There is a potential race between an 'inflight'
       subscribe event, and the start of a schema distribution.
       The subscribing node might or might not take part in the
       schema distribution, and its role is actually unknown at
       the point in time where the schema operation is started by
       the coordinator.
       The set of participating servers could only be determined
       when the Coordinator acks its own schema op: If the subscribe
       event arrived before it own schema up, then the subcribing
       node is a participant.
       This patch modifies the Coordinators ack to also modifying
       the acked slock_bitmap to clear the servers *not* participating.
    
    b) check_wakeup_clients() called get_subcriber_bitmask() to get the
       current set of subscribers. However, 'self' was not included in the
       subscribers, which it always should be. Fixed this by letting
       Ndb_schema_dist_data::init() add 'own_nodeid' to subscribers.
       Furthermore, this enables us to clean up a couple of places
       where we used to add own_nodeid to the set retrieved from
       get_subscribers_bitmask().
    
    c) handle_clear_slock() copied schema->slock into
       ndb_schema_object->slock_bitmap, thereby overwriting the intersect
       done as part of a). Changed the copy to do an intersect instead.
    
    This patch also modifies several places where schema distribution
    progress is printed:
    - Always print more significant part of bitmask before the less significant.
    - Adds some formating when printing the bitmasks.
    
    Also removes a few clear of bitmasks immediately after an init,
    which is redundant as ::init() also cleared it.
Loading