Skip to content
  • Ole John Aske's avatar
    9102fc51
    Bug #25557263 'WAITING MAX 119 SEC FOR DISTRIBUTING'-SCHEMA DISTRIBUTION TIMEOUT · 9102fc51
    Ole John Aske authored
    There is a possible race condition between the schema distribution
    coordinator holding a ref-lock on its schema_object, and the
    binlog-injector thread (participant-role) possibly being late
    to unref-lock the same schema_object.
    
    This could potentially result in another schema distr operation
    getting a ref to the not-yet-released schema_object held
    by the injector thread instead of having to create a new
    schema_object as normally expected.
    
    The SLOCK-bitmap, which keeps track of which participants
    the coordinator is still waiting for, was set to 'all-1'
    when created. However, it will be 'all-0' immediately before
    the injector thread is about the release (unref) it.
    
    Thus, if the coordinator managed to 're-get' this schema_object
    before being released (and destructed) by the injector-thread,
    we got a schema_object->slocks with 'all-0' instead of 'all-1'
    as expected. - This caused a total breakdown of the schema
    distribution protocol.
    
    The fix is to move the schema_object->slock 'all-1' setting
    from the creation of new schema_object() to the place where
    the schema distr coordinator initate waiting for schema operations
    to be distributed. This will cover both scenarios where we
    either had to create a new schema_object, or we reuse an
    existing not-yet-released schema object.
    9102fc51
    Bug #25557263 'WAITING MAX 119 SEC FOR DISTRIBUTING'-SCHEMA DISTRIBUTION TIMEOUT
    Ole John Aske authored
    There is a possible race condition between the schema distribution
    coordinator holding a ref-lock on its schema_object, and the
    binlog-injector thread (participant-role) possibly being late
    to unref-lock the same schema_object.
    
    This could potentially result in another schema distr operation
    getting a ref to the not-yet-released schema_object held
    by the injector thread instead of having to create a new
    schema_object as normally expected.
    
    The SLOCK-bitmap, which keeps track of which participants
    the coordinator is still waiting for, was set to 'all-1'
    when created. However, it will be 'all-0' immediately before
    the injector thread is about the release (unref) it.
    
    Thus, if the coordinator managed to 're-get' this schema_object
    before being released (and destructed) by the injector-thread,
    we got a schema_object->slocks with 'all-0' instead of 'all-1'
    as expected. - This caused a total breakdown of the schema
    distribution protocol.
    
    The fix is to move the schema_object->slock 'all-1' setting
    from the creation of new schema_object() to the place where
    the schema distr coordinator initate waiting for schema operations
    to be distributed. This will cover both scenarios where we
    either had to create a new schema_object, or we reuse an
    existing not-yet-released schema object.
Loading