sql/ndb_schema_object.cc · mysql-cluster-7.4.24 · Rasoul Jahanshahi / Mysql Server

Feb 27, 2017

Bug #25557263 'WAITING MAX 119 SEC FOR DISTRIBUTING'-SCHEMA DISTRIBUTION TIMEOUT · 9102fc51

Ole John Aske authored Feb 27, 2017

There is a possible race condition between the schema distribution
coordinator holding a ref-lock on its schema_object, and the
binlog-injector thread (participant-role) possibly being late
to unref-lock the same schema_object.

This could potentially result in another schema distr operation
getting a ref to the not-yet-released schema_object held
by the injector thread instead of having to create a new
schema_object as normally expected.

The SLOCK-bitmap, which keeps track of which participants
the coordinator is still waiting for, was set to 'all-1'
when created. However, it will be 'all-0' immediately before
the injector thread is about the release (unref) it.

Thus, if the coordinator managed to 're-get' this schema_object
before being released (and destructed) by the injector-thread,
we got a schema_object->slocks with 'all-0' instead of 'all-1'
as expected. - This caused a total breakdown of the schema
distribution protocol.

The fix is to move the schema_object->slock 'all-1' setting
from the creation of new schema_object() to the place where
the schema distr coordinator initate waiting for schema operations
to be distributed. This will cover both scenarios where we
either had to create a new schema_object, or we reuse an
existing not-yet-released schema object.

9102fc51

Bug #25557263 'WAITING MAX 119 SEC FOR DISTRIBUTING'-SCHEMA DISTRIBUTION TIMEOUT

Ole John Aske authored Feb 27, 2017

There is a possible race condition between the schema distribution
coordinator holding a ref-lock on its schema_object, and the
binlog-injector thread (participant-role) possibly being late
to unref-lock the same schema_object.

This could potentially result in another schema distr operation
getting a ref to the not-yet-released schema_object held
by the injector thread instead of having to create a new
schema_object as normally expected.

The SLOCK-bitmap, which keeps track of which participants
the coordinator is still waiting for, was set to 'all-1'
when created. However, it will be 'all-0' immediately before
the injector thread is about the release (unref) it.

Thus, if the coordinator managed to 're-get' this schema_object
before being released (and destructed) by the injector-thread,
we got a schema_object->slocks with 'all-0' instead of 'all-1'
as expected. - This caused a total breakdown of the schema
distribution protocol.

The fix is to move the schema_object->slock 'all-1' setting
from the creation of new schema_object() to the place where
the schema distr coordinator initate waiting for schema operations
to be distributed. This will cover both scenarios where we
either had to create a new schema_object, or we reuse an
existing not-yet-released schema object.