-
Sanjana DS authored
Bug#29211078 DANGLING REFERENCE TO BUFFER OUT OF SCOPE IN QMGR::FAILREPORTLAB Bug#29443052 : WL#12680: CLUSTER CRASHED DURING NODE RESTART WITH AN ERROR Bug#29458648 : WL#12680: NODES CRASHED DURING SHUTDOWN OF CLUSTER WITH DIFFERENT ERRORS .. Bug#29814551 WL#12680: "INITIAL START, WAITING 15 FOR " LIST UNUSED NODE ID This patch changes the way a NodeBitMask object is sent across in signals. The node bitmask should now be sent in a signal section. Also, only the words until the last non-zero word from the left can be sent which is made possible by adding a function- getPackedLengthInWords() to struct BitmaskPOD. This eliminates the need to send unnecessary data and makes the signals shorter. Also add function in ndb_version.h.in to check if the cluster version supports sending and receiving node bitmasks in a signal section. Reviewed-by:
Mauritz Sundell <mauritz.sundell@oracle.com> Reviewed-by:
Mikael Ronström <mikael.ronstrom@oracle.com> Reviewed-by:
Sanjana DS <sanjana.ds@oracle.com> Squashed comments ================= WL #12564 : Adapt CM_REGREQ to deal with the node bitmask changes. Send a dummy node bitmask with all zeroes in GSN_CM_REGREQ since it's not used at the receiving end. This is for backward compatibility. WL #12564 : Modify CM_REGCONF protocol to deal with the node bitmask changes Send node bitmask info in long section of the signal since it wont fit in a normal signal if we increase the node bitmask size. Also add relevant upgrade/downgrade code to cope up with these changes. WL #12564 : Modify CM_REGREF to allow more nodes Adapt CM_REGREF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify FAIL_REP to allow more nodes Adapt FAIL_REP to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify MASTER_GCPCONF to allow more nodes Adapt MASTER_GCPCONF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify PREP_FAILREQ and PREP_FAILREF to allow more nodes Adapt PREP_FAILREQ and PREP_FAILREF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify START_LCP_REQ to allow more nodes Adapt START_LCP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify CNTR_START_CONF to allow more nodes Adapt CNTR_START_CONF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Remove HOT_SPAREREP signal entirely WL#12564: Remove CM_INIT completely WL#12564: Remove CNTR_MASTER* signals entirely WL#12564: BACKUP_CONF, BACKUP_COMPLETE_REP Removed Ndb node bitmasks from both BACKUP_CONF and BACKUP_COMPLETE_REP signals. They were sent but not used at reception of the signal. WL#12564: CHECKNODEGROUPSREQ/CONF CHECKNODEGROUPSREQ and CHECKNODEGROUPSCONF has a Ndb node bitmask in its signal. The signal is only sent locally in a node, so no upgrade logic is required. In most cases the signal is sent as EXECUTE_DIRECT, in those cases no special treatment is required. SUMA sends the signal and receives it. SUMA only requires the node bitmask to be received, but to ensure that new uses of this signal will be possible without changing DIH we add support to receive the node bitmask in a section. In this case we send it as cleared from SUMA. WL#12564: CLOSE_COMREQ/CONF CLOSE_COMREQ and CLOSE_COMCONF uses a bitmask to communicate to TRPMAN from QMGR. Actually a bitmask is only needed in CLOSE_COMREQ since the bitmask returned in CLOSE_COMCONF is always the same bitmask as sent in CLOSE_COMREQ. For some reason we even use this to assign the same variable as we got the bitmask from. So changed signal to carry single node id for cases where no bitmask was needed. In the case where a bitmask was needed it was only a bitmask for data nodes and this is transported as usual in a section. Some extra code was required in TrpmanProxy to ensure that sections could be carried in the CLOSE_COMREQ signal. WL#12564: ENABLE_COMREQ/CONF ENABLE_COMREQ sends a bitmask with nodes to enable communication with. In most cases only a single node is enabled at a time, only in one case is a bitmask needed. This is solved in the same fashion as for CLOSE_COMREQ/CONF. ENABLE_COMCONF needs only the single node sent back, no need to send bitmask back in ENABLE_COMCONF. ENABLE_COMREQ/CONF is only sent local in a node, so no need to handle upgrade situations. WL#12564: DIH_RESTARTREF/CONF Bitmask is sent in response to DIH_RESTARTREQ. DIH_RESTART* are local signals, so no need to handle upgrade situations. DIH_RESTARTREQ can send a bitmask and array of GCIs, but this is only done in EXECUTE_DIRECT. NDBCNTR receives DIH_RESTARTREF/CONF as well, but don't bother with bitmasks, so it is enough to release the section. WL#12564: READ_NODESCONF READ_NODESCONF contains 5 data node bitmasks. These are put into a section in unpacked format. All these signals are only sent in startup and is local to the node, so no reason for upgrade code. WL #12564 : Modify STOP_REQ to allow more nodes Adapt STOP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Remove EMPTY_LCP protocol support Requires 7.4.3 to support upgrades now. WL#12564: DIH_RESTARTCONF Sent bitmask in a section from DIH as it was expected to be received by both NDBCNTR and QMGR. WL#12564: DIH_RESTARTCONF In reception of DIH_RESTARTCONF in QMGR we used the wrong signal to copy bitmask into, this led to overwrite the GCI value sent from DIH that was then 0 and this led to President in QMGR not being the same as the Master in NDBCNTR. Added more documentation of QMGR behaviour. Added a bit more printouts to be able to debug problems with allocation of node ids that sometimes cause issues in NDB. WL#12564: ISOLATE_ORD Ensure that ISOLATE_ORD can handle longer data node bitmasks. WL #12564 : Modify DEFINE_BACKUP_REQ to allow more nodes Adapt DEFINE_BACKUP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify START_RECREQ to allow more nodes Adapt START_RECREQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Support for EVENT_REP Handled StartReport with 5 bitmasks, ConnectCheckStarted with 2 bitmasks, InfoEvent extended to support lengths up to 4091 bytes and same for WarningEvent. Limit of data in EVENT_REP signal section set to 1024 words. Limited code to handle MGM server in lower version than data node. Long info events must have type repeated in section. In SavedEventBuffer::scan() - assert(data_len <= 25); + require(data_len <= MAX_EVENT_REP_SIZE_WORDS); WL #12564 : Modify NODE_FAILREP to allow more nodes Adapt NODE_FAILREP to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL# 12564: FAIL_REP fine tuning WL #12564 : Fix MASTER_GCPCONF WL#12564: ISOLATE_ORD WL# 12564: FAIL_REP fine tuning WL# 12564: START_LCP_REQ bitmask transmission improvements WL# 12564: PREP_FAILREQ and PREP_FAILREF cosmetic changes WL#12564: EVENT_REP problem in warningEvent WL# 12564: STOP_REQ improvements WL# 12564: CM_REGREQ, CM_REGCONF, CM_REGREF, DEFINE_BACKUP_REQ code improvements Use NdbNodeBitmask48::Size instead of 2. WL# 12564: START_RECREQ improvements WL# 12564: NODE_FAILREP cosmetic changes WL#12564: Support for EVENT_REP Handled StartReport with 5 bitmasks, ConnectCheckStarted with 2 bitmasks, InfoEvent extended to support lengths up to 4091 bytes and same for WarningEvent. Limit of data in EVENT_REP signal section set to 1024 words. Limited code to handle MGM server in lower version than data node. WL#12564 Pass node bitmask in section for CNTR_WAITREP:ZWAITPOINT_4_2 To old nodes continue to send it in signal. Bug#29211078 DANGLING REFERENCE TO BUFFER OUT OF SCOPE IN QMGR::FAILREPORTLAB Remove dangling reference to buffer out of scope by moving definition of buffer extra to same scope as pointer msg. WL#12564 Use TextLength constant for bitmask text buffer sizes. General replace of size expression for character buffers used in call to getText() to produce hexdump of node bitmasks. Typically changing char buf[100]; to char buf[NdbNodeBitmask::TextLength + 1]; WL# 12564: Send and receive bitmask in NDB_STARTCONF through signal section WL# 12564: Node bitmask in CONTINUEB in DBSPJ block through signal section WL# 12564: Zero the node bitmask in GSN_EVENT_REP for ignorance The node bitmask sent by the BACKUP block is not used at the receiver (mgm client). Hence, zero those bits for ignorance.
Sanjana DS authoredBug#29211078 DANGLING REFERENCE TO BUFFER OUT OF SCOPE IN QMGR::FAILREPORTLAB Bug#29443052 : WL#12680: CLUSTER CRASHED DURING NODE RESTART WITH AN ERROR Bug#29458648 : WL#12680: NODES CRASHED DURING SHUTDOWN OF CLUSTER WITH DIFFERENT ERRORS .. Bug#29814551 WL#12680: "INITIAL START, WAITING 15 FOR " LIST UNUSED NODE ID This patch changes the way a NodeBitMask object is sent across in signals. The node bitmask should now be sent in a signal section. Also, only the words until the last non-zero word from the left can be sent which is made possible by adding a function- getPackedLengthInWords() to struct BitmaskPOD. This eliminates the need to send unnecessary data and makes the signals shorter. Also add function in ndb_version.h.in to check if the cluster version supports sending and receiving node bitmasks in a signal section. Reviewed-by:
Mauritz Sundell <mauritz.sundell@oracle.com> Reviewed-by:
Mikael Ronström <mikael.ronstrom@oracle.com> Reviewed-by:
Sanjana DS <sanjana.ds@oracle.com> Squashed comments ================= WL #12564 : Adapt CM_REGREQ to deal with the node bitmask changes. Send a dummy node bitmask with all zeroes in GSN_CM_REGREQ since it's not used at the receiving end. This is for backward compatibility. WL #12564 : Modify CM_REGCONF protocol to deal with the node bitmask changes Send node bitmask info in long section of the signal since it wont fit in a normal signal if we increase the node bitmask size. Also add relevant upgrade/downgrade code to cope up with these changes. WL #12564 : Modify CM_REGREF to allow more nodes Adapt CM_REGREF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify FAIL_REP to allow more nodes Adapt FAIL_REP to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify MASTER_GCPCONF to allow more nodes Adapt MASTER_GCPCONF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify PREP_FAILREQ and PREP_FAILREF to allow more nodes Adapt PREP_FAILREQ and PREP_FAILREF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify START_LCP_REQ to allow more nodes Adapt START_LCP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify CNTR_START_CONF to allow more nodes Adapt CNTR_START_CONF to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Remove HOT_SPAREREP signal entirely WL#12564: Remove CM_INIT completely WL#12564: Remove CNTR_MASTER* signals entirely WL#12564: BACKUP_CONF, BACKUP_COMPLETE_REP Removed Ndb node bitmasks from both BACKUP_CONF and BACKUP_COMPLETE_REP signals. They were sent but not used at reception of the signal. WL#12564: CHECKNODEGROUPSREQ/CONF CHECKNODEGROUPSREQ and CHECKNODEGROUPSCONF has a Ndb node bitmask in its signal. The signal is only sent locally in a node, so no upgrade logic is required. In most cases the signal is sent as EXECUTE_DIRECT, in those cases no special treatment is required. SUMA sends the signal and receives it. SUMA only requires the node bitmask to be received, but to ensure that new uses of this signal will be possible without changing DIH we add support to receive the node bitmask in a section. In this case we send it as cleared from SUMA. WL#12564: CLOSE_COMREQ/CONF CLOSE_COMREQ and CLOSE_COMCONF uses a bitmask to communicate to TRPMAN from QMGR. Actually a bitmask is only needed in CLOSE_COMREQ since the bitmask returned in CLOSE_COMCONF is always the same bitmask as sent in CLOSE_COMREQ. For some reason we even use this to assign the same variable as we got the bitmask from. So changed signal to carry single node id for cases where no bitmask was needed. In the case where a bitmask was needed it was only a bitmask for data nodes and this is transported as usual in a section. Some extra code was required in TrpmanProxy to ensure that sections could be carried in the CLOSE_COMREQ signal. WL#12564: ENABLE_COMREQ/CONF ENABLE_COMREQ sends a bitmask with nodes to enable communication with. In most cases only a single node is enabled at a time, only in one case is a bitmask needed. This is solved in the same fashion as for CLOSE_COMREQ/CONF. ENABLE_COMCONF needs only the single node sent back, no need to send bitmask back in ENABLE_COMCONF. ENABLE_COMREQ/CONF is only sent local in a node, so no need to handle upgrade situations. WL#12564: DIH_RESTARTREF/CONF Bitmask is sent in response to DIH_RESTARTREQ. DIH_RESTART* are local signals, so no need to handle upgrade situations. DIH_RESTARTREQ can send a bitmask and array of GCIs, but this is only done in EXECUTE_DIRECT. NDBCNTR receives DIH_RESTARTREF/CONF as well, but don't bother with bitmasks, so it is enough to release the section. WL#12564: READ_NODESCONF READ_NODESCONF contains 5 data node bitmasks. These are put into a section in unpacked format. All these signals are only sent in startup and is local to the node, so no reason for upgrade code. WL #12564 : Modify STOP_REQ to allow more nodes Adapt STOP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Remove EMPTY_LCP protocol support Requires 7.4.3 to support upgrades now. WL#12564: DIH_RESTARTCONF Sent bitmask in a section from DIH as it was expected to be received by both NDBCNTR and QMGR. WL#12564: DIH_RESTARTCONF In reception of DIH_RESTARTCONF in QMGR we used the wrong signal to copy bitmask into, this led to overwrite the GCI value sent from DIH that was then 0 and this led to President in QMGR not being the same as the Master in NDBCNTR. Added more documentation of QMGR behaviour. Added a bit more printouts to be able to debug problems with allocation of node ids that sometimes cause issues in NDB. WL#12564: ISOLATE_ORD Ensure that ISOLATE_ORD can handle longer data node bitmasks. WL #12564 : Modify DEFINE_BACKUP_REQ to allow more nodes Adapt DEFINE_BACKUP_REQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL #12564 : Modify START_RECREQ to allow more nodes Adapt START_RECREQ to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL#12564: Support for EVENT_REP Handled StartReport with 5 bitmasks, ConnectCheckStarted with 2 bitmasks, InfoEvent extended to support lengths up to 4091 bytes and same for WarningEvent. Limit of data in EVENT_REP signal section set to 1024 words. Limited code to handle MGM server in lower version than data node. Long info events must have type repeated in section. In SavedEventBuffer::scan() - assert(data_len <= 25); + require(data_len <= MAX_EVENT_REP_SIZE_WORDS); WL #12564 : Modify NODE_FAILREP to allow more nodes Adapt NODE_FAILREP to deal with the node bitmask changes and add relevant upgrade/downgrade code. WL# 12564: FAIL_REP fine tuning WL #12564 : Fix MASTER_GCPCONF WL#12564: ISOLATE_ORD WL# 12564: FAIL_REP fine tuning WL# 12564: START_LCP_REQ bitmask transmission improvements WL# 12564: PREP_FAILREQ and PREP_FAILREF cosmetic changes WL#12564: EVENT_REP problem in warningEvent WL# 12564: STOP_REQ improvements WL# 12564: CM_REGREQ, CM_REGCONF, CM_REGREF, DEFINE_BACKUP_REQ code improvements Use NdbNodeBitmask48::Size instead of 2. WL# 12564: START_RECREQ improvements WL# 12564: NODE_FAILREP cosmetic changes WL#12564: Support for EVENT_REP Handled StartReport with 5 bitmasks, ConnectCheckStarted with 2 bitmasks, InfoEvent extended to support lengths up to 4091 bytes and same for WarningEvent. Limit of data in EVENT_REP signal section set to 1024 words. Limited code to handle MGM server in lower version than data node. WL#12564 Pass node bitmask in section for CNTR_WAITREP:ZWAITPOINT_4_2 To old nodes continue to send it in signal. Bug#29211078 DANGLING REFERENCE TO BUFFER OUT OF SCOPE IN QMGR::FAILREPORTLAB Remove dangling reference to buffer out of scope by moving definition of buffer extra to same scope as pointer msg. WL#12564 Use TextLength constant for bitmask text buffer sizes. General replace of size expression for character buffers used in call to getText() to produce hexdump of node bitmasks. Typically changing char buf[100]; to char buf[NdbNodeBitmask::TextLength + 1]; WL# 12564: Send and receive bitmask in NDB_STARTCONF through signal section WL# 12564: Node bitmask in CONTINUEB in DBSPJ block through signal section WL# 12564: Zero the node bitmask in GSN_EVENT_REP for ignorance The node bitmask sent by the BACKUP block is not used at the receiver (mgm client). Hence, zero those bits for ignorance.
Loading