Skip to content
  • Venkatesh Duggirala's avatar
    c9b3c559
    Bug #18770469 OUT OF MEMORY ON 5.6 SLAVE USING RBR WITH DATETIME WHEN TABLE · c9b3c559
    Venkatesh Duggirala authored
    CREATED IN 5.5.
    
    Problem: In RBR, running a long transaction, which has big number or rows
    log events, causes Out of Memory(OOM) error on slave when Slave's table
    structure is not compatible with Master's table structure.
    
    Even though the reported issue is between 5.5->5.6 replication
    and with DATATIME column, the problem can be seen in 5.6->5.6 replication
    and with any data type column when Slave's table structure is not
    compatible with Master's table structure i.e., when Slave server finds
    both Master and Slave table's are not compatible with each other and
    decides to create a virtual temporary table to do the row conversion
    at replication layer. This situation happens when some one alters the
    table on Slave by connecting directly to Slave server.
    
    Analysis:
    
    In the above explained scenario, when slave is creating a virtual
    temporary table, the logic is using thd's mem_root to create field
    and table structure. This temporary table is required only till
    the end of Rows_log_event scope but Slave's SQL thread mem_root is
    getting freed only at the end of the transaction. There are some
    exceptions that mem_root is getting freed in some other places in
    the code. Eg: After every Query_log_event::do_apply_event(),
    free_root is called on thread's mem_root. 
    
    Sample events:
      master-bin.000001 536  Query 1 608 BEGIN
      master-bin.000001 608 Table_map 1 696 table_id: 81 (test.t1)
      master-bin.000001 696 Write_rows  1 900 table_id: 81 flags: STMT_END_F
      master-bin.000001 900 Table_map 1 988 table_id: 81 (test.t1)
      master-bin.000001 988 Write_rows  1 1192  table_id: 81 flags: STMT_END_F
      ....
      master-bin.000001 29808 Xid 1 29839 COMMIT /* xid=167 */
    
    Lets say table is 't1' and it's structure is different from Master
    to Slave. And for every Rows_log_event, it creates temporary table
    and uses some 'X' amount of memory from thd's mem_root.
    If a transaction involves 'Y' number of Rows_log_events, then by
    the time of the end of the transaction, server would have used
    'XY' amount of memory from thd's mem_root. This can cause OOM
    when 'Y' is increasing. The root cause of the problem is server
    is holding the memory of temporary table even though it's scope
    is ended after applying the Rows_log_event, hence mem_root usage
    is growing up.
    
    Unlike Query_log_event::do_apply_event(), Rows_log_event::do_apply_event
    can be called from a regular thread (not only SQL thread), hence
    free_root(thd->mem_root,...) cannot be called at the end of
    Rows_log_event::do_apply_event.
    
    Fix: m_event_mem_root ( a special mem root) is added to Log_event
    class and it will be initialized in constructor and freed in
    destructor. If there is a memory requirement while applying
    any type of events whose scope should be till applying the
    event, it can be allocated from this special mem root.
    In the above situation, while creating the temporary table
    the memory needed is allocated from this special mem root
    which will be freed in ~Log_event().
    c9b3c559
    Bug #18770469 OUT OF MEMORY ON 5.6 SLAVE USING RBR WITH DATETIME WHEN TABLE
    Venkatesh Duggirala authored
    CREATED IN 5.5.
    
    Problem: In RBR, running a long transaction, which has big number or rows
    log events, causes Out of Memory(OOM) error on slave when Slave's table
    structure is not compatible with Master's table structure.
    
    Even though the reported issue is between 5.5->5.6 replication
    and with DATATIME column, the problem can be seen in 5.6->5.6 replication
    and with any data type column when Slave's table structure is not
    compatible with Master's table structure i.e., when Slave server finds
    both Master and Slave table's are not compatible with each other and
    decides to create a virtual temporary table to do the row conversion
    at replication layer. This situation happens when some one alters the
    table on Slave by connecting directly to Slave server.
    
    Analysis:
    
    In the above explained scenario, when slave is creating a virtual
    temporary table, the logic is using thd's mem_root to create field
    and table structure. This temporary table is required only till
    the end of Rows_log_event scope but Slave's SQL thread mem_root is
    getting freed only at the end of the transaction. There are some
    exceptions that mem_root is getting freed in some other places in
    the code. Eg: After every Query_log_event::do_apply_event(),
    free_root is called on thread's mem_root. 
    
    Sample events:
      master-bin.000001 536  Query 1 608 BEGIN
      master-bin.000001 608 Table_map 1 696 table_id: 81 (test.t1)
      master-bin.000001 696 Write_rows  1 900 table_id: 81 flags: STMT_END_F
      master-bin.000001 900 Table_map 1 988 table_id: 81 (test.t1)
      master-bin.000001 988 Write_rows  1 1192  table_id: 81 flags: STMT_END_F
      ....
      master-bin.000001 29808 Xid 1 29839 COMMIT /* xid=167 */
    
    Lets say table is 't1' and it's structure is different from Master
    to Slave. And for every Rows_log_event, it creates temporary table
    and uses some 'X' amount of memory from thd's mem_root.
    If a transaction involves 'Y' number of Rows_log_events, then by
    the time of the end of the transaction, server would have used
    'XY' amount of memory from thd's mem_root. This can cause OOM
    when 'Y' is increasing. The root cause of the problem is server
    is holding the memory of temporary table even though it's scope
    is ended after applying the Rows_log_event, hence mem_root usage
    is growing up.
    
    Unlike Query_log_event::do_apply_event(), Rows_log_event::do_apply_event
    can be called from a regular thread (not only SQL thread), hence
    free_root(thd->mem_root,...) cannot be called at the end of
    Rows_log_event::do_apply_event.
    
    Fix: m_event_mem_root ( a special mem root) is added to Log_event
    class and it will be initialized in constructor and freed in
    destructor. If there is a memory requirement while applying
    any type of events whose scope should be till applying the
    event, it can be allocated from this special mem root.
    In the above situation, while creating the temporary table
    the memory needed is allocated from this special mem root
    which will be freed in ~Log_event().
Loading