Skip to content
  • Mikael Ronström's avatar
    39b16976
    WL#12837: Multiple transporters between nodes in a node group · 39b16976
    Mikael Ronström authored
    
    
    Reviewed-by: default avatarMauritz Sundell <mauritz.sundell@oracle.com>
    
    The first step is to use transporter id inside mt.cpp to address transporter
    rather than node id. This change affects large parts of the transporter
    handling. In the API we still use the node id to address the transporter and
    send buffers. This means that in the Transporter Handle interface we mostly
    send both node id and transporter id. This should
    be fairly efficient since they are both passed as register parameters.
    
    This is a preparatory step to prepare for having multiple transporters to
    communicate with another node. In particular to communicate with other
    data nodes and even more so to communicate with data nodes in the same node
    group. This worklog only introduces multi socket transporters for nodes
    within the same node group. It is almost trivial to add it for
    communication to any other data node. For API nodes we already have the
    concept of multipl cluster connections, so not as much needed for
    these nodes. It would however be beneficial to decrease the amount of
    API nodes sometime in the future by using multi sockets here as well.
    
    Step 2 ensures that we can change from a single transporter to a multi
    transporter. This happens in Phase 2 (or Phase 4 for node restarts)
    started by NDBCNTR but the action
    is controlled by QMGR. At first we need to create a multi transporter.
    This transporter will never be destroyed and will be filled with as
    many transporters as we have defined the transporter to use, this is
    configured in the configuration variable NodeGroupTransporters.
    
    Before we allow the new transporters to connect we get the number of
    NodeGroupTransporters of the other nodes. We only setup multiple
    transporters for communication inside the same node group for the
    time being.
    
    Next we allow the nodes to connect by inserting the new multi
    transporter into theNodeIdTransporters and the new transporters
    to be used into the allTransporters array. This will trigger
    startServer and start_clients_thread to setup connection for those
    new connections.
    
    When the nodes have connected we ask the other node if it is prepared
    to switch to the new set of transporters. When this have been acknowledged
    we will start the switch over. This switch over happens asynchronously on
    the two sides. So for a short period we can have one transporter in one
    direction and multiple in the other direction. The important thing here is
    to maintain signal order for communication between 2 distinct threads in
    different nodes.
    
    To prepare for the switch we ensure that all block threads except the
    main thread have been blocked and that their send buffers have been
    flushed. When this is completed we flush the send buffers of the
    main thread as well and lock the new multi transporter mutex followed
    by locking the send and receive of the node. Finally we send an
    ACTIVATE_TRP_REQ to the other side. This signal will activate the
    recv thread to start reading data from the new set of transporters.
    
    The code in QmgrMain.cpp contains a thorough description of the switch
    protocol to multi socket transporter.
    
    After performing this the node responds with a CONF message to this.
    After performing this we disable sending on the old transporter
    since we now know that all signals have been sent to this node.
    
    After being done we check if more nodes are ready to switch over
    (can happen with 3 and 4 replicas).
    
    When done and in start phase 3 we send a CONF message back to NDBCNTR
    that we are ready with our setup of multi transporters and the start
    is ready to continue.
    
    Added new hash function for mapping to transporter. The mapping uses
    the receiver instance and use modulo the number of sockets to get
    the socket instance to use for the communication. So communication
    between two instances always follow the same path.
    
    Fixed a problem in startChangeNeighbourNode where I failed to initialise
    the m_neighbour_trp variable leading to that sockets were kept as
    neighbours in one respect in that they were not put into the other list
    of transporters waiting to send, but since they were not listed in the
    list of neighbours they were never scheduled to send. This bug has not
    caused any issues previously since there was never any changes to the
    neighbour list. Now there are changes to the list and thus more is required
    from those methods.
    
    Port number assignment previously occurred in start_clients_thread where we
    inquired the dynamic port number from the management server. To ensure that
    we use the same port number also for multi sockets transporters we copy the
    port number from the original transporter to the multi socket transporters.
    
    A lot of new debug code have been added to make it possible to trace problems
    in this area.
    
    Setup of multi sockets happens in an early phase for Initial start and System
    restarts. For node restarts, initial and normal we wait until we have
    received the COPY_GCICONF from master with new sysfile. This will tell us
    the current setup of node groups and is required to know which nodes are
    our neighbour nodes. Thus we setup multi sockets for node restarts right before
    we copy the dictionary data.
    
    There is also a situation when we create a new node group. In this case we
    set up the multi sockets at the time the node group is created. If the
    node group is dropped, we take no special action for this. This creation of
    multi sockets is expected to happen while the node has already started.
    
    Wrote a thorough description of the connect-disconnect handling both for
    normal transporters and for multi transporters in the ingress of the
    method update_connections in TransporterRegistry.cpp. A thorough description
    of the setup protocol for multi transporters is written up in
    the ingress of execSET_UP_MULTI_TRP_REQ in QmgrMain.cpp
    
    Added new simple test cases in MTR for 3 and 4 replicas since these required
    some changes in the multi socket setup to avoid ending up in deadlock between
    nodes trying to set up multi sockets.
    
    Normally it should not be necessary to configure the number of node group
    transporters. The default is 0 and this is interpreted as half the number of
    LDM threads or half the number of TC threads if there are more TC threads in
    the node. This should be scalable enough for most needs. Experiments have
    shown that there is no performance regression using multiple sockets, rather
    1-2% improvement. It is still possible to set it hard-coded through configuration
    to a value between 1 and 32.
    39b16976
    WL#12837: Multiple transporters between nodes in a node group
    Mikael Ronström authored
    
    
    Reviewed-by: default avatarMauritz Sundell <mauritz.sundell@oracle.com>
    
    The first step is to use transporter id inside mt.cpp to address transporter
    rather than node id. This change affects large parts of the transporter
    handling. In the API we still use the node id to address the transporter and
    send buffers. This means that in the Transporter Handle interface we mostly
    send both node id and transporter id. This should
    be fairly efficient since they are both passed as register parameters.
    
    This is a preparatory step to prepare for having multiple transporters to
    communicate with another node. In particular to communicate with other
    data nodes and even more so to communicate with data nodes in the same node
    group. This worklog only introduces multi socket transporters for nodes
    within the same node group. It is almost trivial to add it for
    communication to any other data node. For API nodes we already have the
    concept of multipl cluster connections, so not as much needed for
    these nodes. It would however be beneficial to decrease the amount of
    API nodes sometime in the future by using multi sockets here as well.
    
    Step 2 ensures that we can change from a single transporter to a multi
    transporter. This happens in Phase 2 (or Phase 4 for node restarts)
    started by NDBCNTR but the action
    is controlled by QMGR. At first we need to create a multi transporter.
    This transporter will never be destroyed and will be filled with as
    many transporters as we have defined the transporter to use, this is
    configured in the configuration variable NodeGroupTransporters.
    
    Before we allow the new transporters to connect we get the number of
    NodeGroupTransporters of the other nodes. We only setup multiple
    transporters for communication inside the same node group for the
    time being.
    
    Next we allow the nodes to connect by inserting the new multi
    transporter into theNodeIdTransporters and the new transporters
    to be used into the allTransporters array. This will trigger
    startServer and start_clients_thread to setup connection for those
    new connections.
    
    When the nodes have connected we ask the other node if it is prepared
    to switch to the new set of transporters. When this have been acknowledged
    we will start the switch over. This switch over happens asynchronously on
    the two sides. So for a short period we can have one transporter in one
    direction and multiple in the other direction. The important thing here is
    to maintain signal order for communication between 2 distinct threads in
    different nodes.
    
    To prepare for the switch we ensure that all block threads except the
    main thread have been blocked and that their send buffers have been
    flushed. When this is completed we flush the send buffers of the
    main thread as well and lock the new multi transporter mutex followed
    by locking the send and receive of the node. Finally we send an
    ACTIVATE_TRP_REQ to the other side. This signal will activate the
    recv thread to start reading data from the new set of transporters.
    
    The code in QmgrMain.cpp contains a thorough description of the switch
    protocol to multi socket transporter.
    
    After performing this the node responds with a CONF message to this.
    After performing this we disable sending on the old transporter
    since we now know that all signals have been sent to this node.
    
    After being done we check if more nodes are ready to switch over
    (can happen with 3 and 4 replicas).
    
    When done and in start phase 3 we send a CONF message back to NDBCNTR
    that we are ready with our setup of multi transporters and the start
    is ready to continue.
    
    Added new hash function for mapping to transporter. The mapping uses
    the receiver instance and use modulo the number of sockets to get
    the socket instance to use for the communication. So communication
    between two instances always follow the same path.
    
    Fixed a problem in startChangeNeighbourNode where I failed to initialise
    the m_neighbour_trp variable leading to that sockets were kept as
    neighbours in one respect in that they were not put into the other list
    of transporters waiting to send, but since they were not listed in the
    list of neighbours they were never scheduled to send. This bug has not
    caused any issues previously since there was never any changes to the
    neighbour list. Now there are changes to the list and thus more is required
    from those methods.
    
    Port number assignment previously occurred in start_clients_thread where we
    inquired the dynamic port number from the management server. To ensure that
    we use the same port number also for multi sockets transporters we copy the
    port number from the original transporter to the multi socket transporters.
    
    A lot of new debug code have been added to make it possible to trace problems
    in this area.
    
    Setup of multi sockets happens in an early phase for Initial start and System
    restarts. For node restarts, initial and normal we wait until we have
    received the COPY_GCICONF from master with new sysfile. This will tell us
    the current setup of node groups and is required to know which nodes are
    our neighbour nodes. Thus we setup multi sockets for node restarts right before
    we copy the dictionary data.
    
    There is also a situation when we create a new node group. In this case we
    set up the multi sockets at the time the node group is created. If the
    node group is dropped, we take no special action for this. This creation of
    multi sockets is expected to happen while the node has already started.
    
    Wrote a thorough description of the connect-disconnect handling both for
    normal transporters and for multi transporters in the ingress of the
    method update_connections in TransporterRegistry.cpp. A thorough description
    of the setup protocol for multi transporters is written up in
    the ingress of execSET_UP_MULTI_TRP_REQ in QmgrMain.cpp
    
    Added new simple test cases in MTR for 3 and 4 replicas since these required
    some changes in the multi socket setup to avoid ending up in deadlock between
    nodes trying to set up multi sockets.
    
    Normally it should not be necessary to configure the number of node group
    transporters. The default is 0 and this is interpreted as half the number of
    LDM threads or half the number of TC threads if there are more TC threads in
    the node. This should be scalable enough for most needs. Experiments have
    shown that there is no performance regression using multiple sockets, rather
    1-2% improvement. It is still possible to set it hard-coded through configuration
    to a value between 1 and 32.
Loading