mysql-test/suite/ndb/t/ndb_basic_3rpl.test · mysql-cluster-8.0.25 · Rasoul Jahanshahi / Mysql Server

Oct 08, 2019

WL#12837: Multiple transporters between nodes in a node group · 39b16976

Mikael Ronström authored Oct 08, 2019

Reviewed-by: Mauritz Sundell <mauritz.sundell@oracle.com>

The first step is to use transporter id inside mt.cpp to address transporter
rather than node id. This change affects large parts of the transporter
handling. In the API we still use the node id to address the transporter and
send buffers. This means that in the Transporter Handle interface we mostly
send both node id and transporter id. This should
be fairly efficient since they are both passed as register parameters.

This is a preparatory step to prepare for having multiple transporters to
communicate with another node. In particular to communicate with other
data nodes and even more so to communicate with data nodes in the same node
group. This worklog only introduces multi socket transporters for nodes
within the same node group. It is almost trivial to add it for
communication to any other data node. For API nodes we already have the
concept of multipl cluster connections, so not as much needed for
these nodes. It would however be beneficial to decrease the amount of
API nodes sometime in the future by using multi sockets here as well.

Step 2 ensures that we can change from a single transporter to a multi
transporter. This happens in Phase 2 (or Phase 4 for node restarts)
started by NDBCNTR but the action
is controlled by QMGR. At first we need to create a multi transporter.
This transporter will never be destroyed and will be filled with as
many transporters as we have defined the transporter to use, this is
configured in the configuration variable NodeGroupTransporters.

Before we allow the new transporters to connect we get the number of
NodeGroupTransporters of the other nodes. We only setup multiple
transporters for communication inside the same node group for the
time being.

Next we allow the nodes to connect by inserting the new multi
transporter into theNodeIdTransporters and the new transporters
to be used into the allTransporters array. This will trigger
startServer and start_clients_thread to setup connection for those
new connections.

When the nodes have connected we ask the other node if it is prepared
to switch to the new set of transporters. When this have been acknowledged
we will start the switch over. This switch over happens asynchronously on
the two sides. So for a short period we can have one transporter in one
direction and multiple in the other direction. The important thing here is
to maintain signal order for communication between 2 distinct threads in
different nodes.

To prepare for the switch we ensure that all block threads except the
main thread have been blocked and that their send buffers have been
flushed. When this is completed we flush the send buffers of the
main thread as well and lock the new multi transporter mutex followed
by locking the send and receive of the node. Finally we send an
ACTIVATE_TRP_REQ to the other side. This signal will activate the
recv thread to start reading data from the new set of transporters.

The code in QmgrMain.cpp contains a thorough description of the switch
protocol to multi socket transporter.

After performing this the node responds with a CONF message to this.
After performing this we disable sending on the old transporter
since we now know that all signals have been sent to this node.

After being done we check if more nodes are ready to switch over
(can happen with 3 and 4 replicas).

When done and in start phase 3 we send a CONF message back to NDBCNTR
that we are ready with our setup of multi transporters and the start
is ready to continue.

Added new hash function for mapping to transporter. The mapping uses
the receiver instance and use modulo the number of sockets to get
the socket instance to use for the communication. So communication
between two instances always follow the same path.

Fixed a problem in startChangeNeighbourNode where I failed to initialise
the m_neighbour_trp variable leading to that sockets were kept as
neighbours in one respect in that they were not put into the other list
of transporters waiting to send, but since they were not listed in the
list of neighbours they were never scheduled to send. This bug has not
caused any issues previously since there was never any changes to the
neighbour list. Now there are changes to the list and thus more is required
from those methods.

Port number assignment previously occurred in start_clients_thread where we
inquired the dynamic port number from the management server. To ensure that
we use the same port number also for multi sockets transporters we copy the
port number from the original transporter to the multi socket transporters.

A lot of new debug code have been added to make it possible to trace problems
in this area.

Setup of multi sockets happens in an early phase for Initial start and System
restarts. For node restarts, initial and normal we wait until we have
received the COPY_GCICONF from master with new sysfile. This will tell us
the current setup of node groups and is required to know which nodes are
our neighbour nodes. Thus we setup multi sockets for node restarts right before
we copy the dictionary data.

There is also a situation when we create a new node group. In this case we
set up the multi sockets at the time the node group is created. If the
node group is dropped, we take no special action for this. This creation of
multi sockets is expected to happen while the node has already started.

Wrote a thorough description of the connect-disconnect handling both for
normal transporters and for multi transporters in the ingress of the
method update_connections in TransporterRegistry.cpp. A thorough description
of the setup protocol for multi transporters is written up in
the ingress of execSET_UP_MULTI_TRP_REQ in QmgrMain.cpp

Added new simple test cases in MTR for 3 and 4 replicas since these required
some changes in the multi socket setup to avoid ending up in deadlock between
nodes trying to set up multi sockets.

Normally it should not be necessary to configure the number of node group
transporters. The default is 0 and this is interpreted as half the number of
LDM threads or half the number of TC threads if there are more TC threads in
the node. This should be scalable enough for most needs. Experiments have
shown that there is no performance regression using multiple sockets, rather
1-2% improvement. It is still possible to set it hard-coded through configuration
to a value between 1 and 32.

39b16976

WL#12837: Multiple transporters between nodes in a node group