-
Venkatesh Duggirala authored
CREATED IN 5.5. Problem: In RBR, running a long transaction, which has big number or rows log events, causes Out of Memory(OOM) error on slave when Slave's table structure is not compatible with Master's table structure. Even though the reported issue is between 5.5->5.6 replication and with DATATIME column, the problem can be seen in 5.6->5.6 replication and with any data type column when Slave's table structure is not compatible with Master's table structure i.e., when Slave server finds both Master and Slave table's are not compatible with each other and decides to create a virtual temporary table to do the row conversion at replication layer. This situation happens when some one alters the table on Slave by connecting directly to Slave server. Analysis: In the above explained scenario, when slave is creating a virtual temporary table, the logic is using thd's mem_root to create field and table structure. This temporary table is required only till the end of Rows_log_event scope but Slave's SQL thread mem_root is getting freed only at the end of the transaction. There are some exceptions that mem_root is getting freed in some other places in the code. Eg: After every Query_log_event::do_apply_event(), free_root is called on thread's mem_root. Sample events: master-bin.000001 536 Query 1 608 BEGIN master-bin.000001 608 Table_map 1 696 table_id: 81 (test.t1) master-bin.000001 696 Write_rows 1 900 table_id: 81 flags: STMT_END_F master-bin.000001 900 Table_map 1 988 table_id: 81 (test.t1) master-bin.000001 988 Write_rows 1 1192 table_id: 81 flags: STMT_END_F .... master-bin.000001 29808 Xid 1 29839 COMMIT /* xid=167 */ Lets say table is 't1' and it's structure is different from Master to Slave. And for every Rows_log_event, it creates temporary table and uses some 'X' amount of memory from thd's mem_root. If a transaction involves 'Y' number of Rows_log_events, then by the time of the end of the transaction, server would have used 'XY' amount of memory from thd's mem_root. This can cause OOM when 'Y' is increasing. The root cause of the problem is server is holding the memory of temporary table even though it's scope is ended after applying the Rows_log_event, hence mem_root usage is growing up. Unlike Query_log_event::do_apply_event(), Rows_log_event::do_apply_event can be called from a regular thread (not only SQL thread), hence free_root(thd->mem_root,...) cannot be called at the end of Rows_log_event::do_apply_event. Fix: m_event_mem_root ( a special mem root) is added to Log_event class and it will be initialized in constructor and freed in destructor. If there is a memory requirement while applying any type of events whose scope should be till applying the event, it can be allocated from this special mem root. In the above situation, while creating the temporary table the memory needed is allocated from this special mem root which will be freed in ~Log_event().
Venkatesh Duggirala authoredCREATED IN 5.5. Problem: In RBR, running a long transaction, which has big number or rows log events, causes Out of Memory(OOM) error on slave when Slave's table structure is not compatible with Master's table structure. Even though the reported issue is between 5.5->5.6 replication and with DATATIME column, the problem can be seen in 5.6->5.6 replication and with any data type column when Slave's table structure is not compatible with Master's table structure i.e., when Slave server finds both Master and Slave table's are not compatible with each other and decides to create a virtual temporary table to do the row conversion at replication layer. This situation happens when some one alters the table on Slave by connecting directly to Slave server. Analysis: In the above explained scenario, when slave is creating a virtual temporary table, the logic is using thd's mem_root to create field and table structure. This temporary table is required only till the end of Rows_log_event scope but Slave's SQL thread mem_root is getting freed only at the end of the transaction. There are some exceptions that mem_root is getting freed in some other places in the code. Eg: After every Query_log_event::do_apply_event(), free_root is called on thread's mem_root. Sample events: master-bin.000001 536 Query 1 608 BEGIN master-bin.000001 608 Table_map 1 696 table_id: 81 (test.t1) master-bin.000001 696 Write_rows 1 900 table_id: 81 flags: STMT_END_F master-bin.000001 900 Table_map 1 988 table_id: 81 (test.t1) master-bin.000001 988 Write_rows 1 1192 table_id: 81 flags: STMT_END_F .... master-bin.000001 29808 Xid 1 29839 COMMIT /* xid=167 */ Lets say table is 't1' and it's structure is different from Master to Slave. And for every Rows_log_event, it creates temporary table and uses some 'X' amount of memory from thd's mem_root. If a transaction involves 'Y' number of Rows_log_events, then by the time of the end of the transaction, server would have used 'XY' amount of memory from thd's mem_root. This can cause OOM when 'Y' is increasing. The root cause of the problem is server is holding the memory of temporary table even though it's scope is ended after applying the Rows_log_event, hence mem_root usage is growing up. Unlike Query_log_event::do_apply_event(), Rows_log_event::do_apply_event can be called from a regular thread (not only SQL thread), hence free_root(thd->mem_root,...) cannot be called at the end of Rows_log_event::do_apply_event. Fix: m_event_mem_root ( a special mem root) is added to Log_event class and it will be initialized in constructor and freed in destructor. If there is a memory requirement while applying any type of events whose scope should be till applying the event, it can be allocated from this special mem root. In the above situation, while creating the temporary table the memory needed is allocated from this special mem root which will be freed in ~Log_event().
Loading