-
Mikael Ronström authored
Reviewed-by:
Mauritz Sundell <mauritz.sundell@oracle.com> The first step is to use transporter id inside mt.cpp to address transporter rather than node id. This change affects large parts of the transporter handling. In the API we still use the node id to address the transporter and send buffers. This means that in the Transporter Handle interface we mostly send both node id and transporter id. This should be fairly efficient since they are both passed as register parameters. This is a preparatory step to prepare for having multiple transporters to communicate with another node. In particular to communicate with other data nodes and even more so to communicate with data nodes in the same node group. This worklog only introduces multi socket transporters for nodes within the same node group. It is almost trivial to add it for communication to any other data node. For API nodes we already have the concept of multipl cluster connections, so not as much needed for these nodes. It would however be beneficial to decrease the amount of API nodes sometime in the future by using multi sockets here as well. Step 2 ensures that we can change from a single transporter to a multi transporter. This happens in Phase 2 (or Phase 4 for node restarts) started by NDBCNTR but the action is controlled by QMGR. At first we need to create a multi transporter. This transporter will never be destroyed and will be filled with as many transporters as we have defined the transporter to use, this is configured in the configuration variable NodeGroupTransporters. Before we allow the new transporters to connect we get the number of NodeGroupTransporters of the other nodes. We only setup multiple transporters for communication inside the same node group for the time being. Next we allow the nodes to connect by inserting the new multi transporter into theNodeIdTransporters and the new transporters to be used into the allTransporters array. This will trigger startServer and start_clients_thread to setup connection for those new connections. When the nodes have connected we ask the other node if it is prepared to switch to the new set of transporters. When this have been acknowledged we will start the switch over. This switch over happens asynchronously on the two sides. So for a short period we can have one transporter in one direction and multiple in the other direction. The important thing here is to maintain signal order for communication between 2 distinct threads in different nodes. To prepare for the switch we ensure that all block threads except the main thread have been blocked and that their send buffers have been flushed. When this is completed we flush the send buffers of the main thread as well and lock the new multi transporter mutex followed by locking the send and receive of the node. Finally we send an ACTIVATE_TRP_REQ to the other side. This signal will activate the recv thread to start reading data from the new set of transporters. The code in QmgrMain.cpp contains a thorough description of the switch protocol to multi socket transporter. After performing this the node responds with a CONF message to this. After performing this we disable sending on the old transporter since we now know that all signals have been sent to this node. After being done we check if more nodes are ready to switch over (can happen with 3 and 4 replicas). When done and in start phase 3 we send a CONF message back to NDBCNTR that we are ready with our setup of multi transporters and the start is ready to continue. Added new hash function for mapping to transporter. The mapping uses the receiver instance and use modulo the number of sockets to get the socket instance to use for the communication. So communication between two instances always follow the same path. Fixed a problem in startChangeNeighbourNode where I failed to initialise the m_neighbour_trp variable leading to that sockets were kept as neighbours in one respect in that they were not put into the other list of transporters waiting to send, but since they were not listed in the list of neighbours they were never scheduled to send. This bug has not caused any issues previously since there was never any changes to the neighbour list. Now there are changes to the list and thus more is required from those methods. Port number assignment previously occurred in start_clients_thread where we inquired the dynamic port number from the management server. To ensure that we use the same port number also for multi sockets transporters we copy the port number from the original transporter to the multi socket transporters. A lot of new debug code have been added to make it possible to trace problems in this area. Setup of multi sockets happens in an early phase for Initial start and System restarts. For node restarts, initial and normal we wait until we have received the COPY_GCICONF from master with new sysfile. This will tell us the current setup of node groups and is required to know which nodes are our neighbour nodes. Thus we setup multi sockets for node restarts right before we copy the dictionary data. There is also a situation when we create a new node group. In this case we set up the multi sockets at the time the node group is created. If the node group is dropped, we take no special action for this. This creation of multi sockets is expected to happen while the node has already started. Wrote a thorough description of the connect-disconnect handling both for normal transporters and for multi transporters in the ingress of the method update_connections in TransporterRegistry.cpp. A thorough description of the setup protocol for multi transporters is written up in the ingress of execSET_UP_MULTI_TRP_REQ in QmgrMain.cpp Added new simple test cases in MTR for 3 and 4 replicas since these required some changes in the multi socket setup to avoid ending up in deadlock between nodes trying to set up multi sockets. Normally it should not be necessary to configure the number of node group transporters. The default is 0 and this is interpreted as half the number of LDM threads or half the number of TC threads if there are more TC threads in the node. This should be scalable enough for most needs. Experiments have shown that there is no performance regression using multiple sockets, rather 1-2% improvement. It is still possible to set it hard-coded through configuration to a value between 1 and 32.
Mikael Ronström authoredReviewed-by:
Mauritz Sundell <mauritz.sundell@oracle.com> The first step is to use transporter id inside mt.cpp to address transporter rather than node id. This change affects large parts of the transporter handling. In the API we still use the node id to address the transporter and send buffers. This means that in the Transporter Handle interface we mostly send both node id and transporter id. This should be fairly efficient since they are both passed as register parameters. This is a preparatory step to prepare for having multiple transporters to communicate with another node. In particular to communicate with other data nodes and even more so to communicate with data nodes in the same node group. This worklog only introduces multi socket transporters for nodes within the same node group. It is almost trivial to add it for communication to any other data node. For API nodes we already have the concept of multipl cluster connections, so not as much needed for these nodes. It would however be beneficial to decrease the amount of API nodes sometime in the future by using multi sockets here as well. Step 2 ensures that we can change from a single transporter to a multi transporter. This happens in Phase 2 (or Phase 4 for node restarts) started by NDBCNTR but the action is controlled by QMGR. At first we need to create a multi transporter. This transporter will never be destroyed and will be filled with as many transporters as we have defined the transporter to use, this is configured in the configuration variable NodeGroupTransporters. Before we allow the new transporters to connect we get the number of NodeGroupTransporters of the other nodes. We only setup multiple transporters for communication inside the same node group for the time being. Next we allow the nodes to connect by inserting the new multi transporter into theNodeIdTransporters and the new transporters to be used into the allTransporters array. This will trigger startServer and start_clients_thread to setup connection for those new connections. When the nodes have connected we ask the other node if it is prepared to switch to the new set of transporters. When this have been acknowledged we will start the switch over. This switch over happens asynchronously on the two sides. So for a short period we can have one transporter in one direction and multiple in the other direction. The important thing here is to maintain signal order for communication between 2 distinct threads in different nodes. To prepare for the switch we ensure that all block threads except the main thread have been blocked and that their send buffers have been flushed. When this is completed we flush the send buffers of the main thread as well and lock the new multi transporter mutex followed by locking the send and receive of the node. Finally we send an ACTIVATE_TRP_REQ to the other side. This signal will activate the recv thread to start reading data from the new set of transporters. The code in QmgrMain.cpp contains a thorough description of the switch protocol to multi socket transporter. After performing this the node responds with a CONF message to this. After performing this we disable sending on the old transporter since we now know that all signals have been sent to this node. After being done we check if more nodes are ready to switch over (can happen with 3 and 4 replicas). When done and in start phase 3 we send a CONF message back to NDBCNTR that we are ready with our setup of multi transporters and the start is ready to continue. Added new hash function for mapping to transporter. The mapping uses the receiver instance and use modulo the number of sockets to get the socket instance to use for the communication. So communication between two instances always follow the same path. Fixed a problem in startChangeNeighbourNode where I failed to initialise the m_neighbour_trp variable leading to that sockets were kept as neighbours in one respect in that they were not put into the other list of transporters waiting to send, but since they were not listed in the list of neighbours they were never scheduled to send. This bug has not caused any issues previously since there was never any changes to the neighbour list. Now there are changes to the list and thus more is required from those methods. Port number assignment previously occurred in start_clients_thread where we inquired the dynamic port number from the management server. To ensure that we use the same port number also for multi sockets transporters we copy the port number from the original transporter to the multi socket transporters. A lot of new debug code have been added to make it possible to trace problems in this area. Setup of multi sockets happens in an early phase for Initial start and System restarts. For node restarts, initial and normal we wait until we have received the COPY_GCICONF from master with new sysfile. This will tell us the current setup of node groups and is required to know which nodes are our neighbour nodes. Thus we setup multi sockets for node restarts right before we copy the dictionary data. There is also a situation when we create a new node group. In this case we set up the multi sockets at the time the node group is created. If the node group is dropped, we take no special action for this. This creation of multi sockets is expected to happen while the node has already started. Wrote a thorough description of the connect-disconnect handling both for normal transporters and for multi transporters in the ingress of the method update_connections in TransporterRegistry.cpp. A thorough description of the setup protocol for multi transporters is written up in the ingress of execSET_UP_MULTI_TRP_REQ in QmgrMain.cpp Added new simple test cases in MTR for 3 and 4 replicas since these required some changes in the multi socket setup to avoid ending up in deadlock between nodes trying to set up multi sockets. Normally it should not be necessary to configure the number of node group transporters. The default is 0 and this is interpreted as half the number of LDM threads or half the number of TC threads if there are more TC threads in the node. This should be scalable enough for most needs. Experiments have shown that there is no performance regression using multiple sockets, rather 1-2% improvement. It is still possible to set it hard-coded through configuration to a value between 1 and 32.
Loading