Skip to content
  • Abhishek Ranjan's avatar
    05c09f61
    Bug#13901905 - CRASH: INSERT ON DUPLICATE KEY UPDATE READS FREED · 05c09f61
    Abhishek Ranjan authored
                   MEMORY
    
    Description:
    ------------
    
    In the INSERT... UPDATE ... statement, if the statement finds
    duplicate key, update part is executed.
    If the data of blob ( or geometry ) field is updated and their
    value from the INSERT part of the statement is tried to access
    through VALUES() function then statement gives either valgrind
    error or wrong updated data or both.
    
    Analysis:
    ---------
    Let's consider the  "INSERT ... UPDATE ..." statement. When we
    try to execute this statement and "INSERT" fails due to UNIQUE KEY
    constraint violation, 'UPDATE' clause is executed.
    
    There are 3 buffers involved in execution of "INSERT...UPDATE" statement,
    
    - record[0]      : Holds the new data from 'INSERT' clause
                       OR after 'UPDATE' clause is processed.
    
    - record[1]      : Copy of previous row found in table,
                       when INSERT fails due to UNIQUE KEY violation.
    
    - insert_values  : Copy of data provided in the 'INSERT' clause,
                       when INSERT fails due to UNIQUE KEY violation.
    
    We know that in a "INSERT...UPDATE" statement, "UPDATE" clause can refer to the
    data provided in the "INSERT" clause through VALUES() function.
    
    Now, lets consider table created with statement:
    "CREATE TABLE t1 (a INT, b BLOB, UNIQUE KEY(a));"
    and INSERT...UPDATE statement,
    "INSERT INTO t1 SET b = '11',a=0
      ON DUPLICATE KEY UPDATE b = VALUES(a),a = values(b);"
    
    We have "a= VALUES(b)" expression in "UPDATE" clause. Let us call the Field* referred
    by 'a' in LHS of expression as LHS_FIELD, and Field* referring 'b' by "VALUES(b)"
    expression in RHS as RHS_FIELD.
    
    Point to be noted here is that separate RHS_FIELD objects are made for each
    VALUES() function. These RHS_FIELD objects are clones of LHS_FIELD. The buffer's
    they point to is,
    - LHS_FIELD => where its Field->ptr member points to record[0]
    - RHS_FIELD => where its Field->ptr member points to insert_values.
    
    Root cause of problem:
    -----------------------
    In general, field data is stored at Field->ptr itself. In case of BLOB Field types
    the Field->ptr points to the length and address where the BLOB DATA is located.
    
    It can be visualized graphically as:
    
    - The values supplied by "INSERT" clause of the statement are being stored as,
    
      LHS_FIELD's:
                          a                             b
                          ||                            ||
                          \/                            \/
                    ............................................
      record [0] :  |  a (int ) |     b (blob address )        |
                    ............................................
                                          ||
                                          \/
                                .........................
                                |      BLOB DATA        |
                                .........................
    
    - Now if insert statement fails, and "UPDATE" clause is executed, then
      record[0] is copied to insert_values buffer for later access.
      We can visualize it as,
    
      LHS_FIELD's:
                          a                             b
                          ||                            ||
                          \/                            \/
                    ............................................
      record [0] :  |  a (int ) |     b (blob address )        |
                    ............................................
                                          ||
                                          \/
                                .........................
                                |      BLOB DATA        |
                                .........................
                                          /\
                                          ||
                     ............................................
      insert_values: |  a (int ) |     b (blob address )        |
                     ............................................
                          /\                            /\
                          ||                            ||
      RHS_FIELD's:     VALUES(a)                     VALUES(b)
    
     Above diagram is true for even multiple references of
     'a', 'b', 'values(a)', 'values(b)'.
    
     As we see, the same BLOB DATA is pointed by both LHS_FIELD and RHS_FIELD.
     If the "UPDATE clause" changes LHS_FIELD first and then tries to access
     RHS_FIELD, then we get wrong data or valgrind error or both.
     Eg:
     - Valgrind error occurs if the memory pointed by LHS_FIELD->ptr
       has been freed and reallocated to accomodate new data.
     - Wrong value is updated, if the memory pointed by LHS_FIELD->ptr
       was not freed but reused, and new data can fit in the same memory location.
     - Both of above could happens, if the new memory is allocated at the same
       location as previous one.
    
    Fix:
    ----
     Create a separate copy of the BLOB DATA for RHS_FIELD field, when
     BLOB DATA is modified as well as accessed through VALUES() function in the
     "UPDATE" clause of the statement.
    
    - Hold LHS_FIELD and RHS_FIELD in a 'std::map'. This is used later to make
      separate copy for RHS_FIELD, if required.
    
    - Separate copy of BLOB DATA for RHS_FIELD is made, if the same field is
      being updated in "UPDATE" clause.
    
      LHS_FIELD:                  a                             b
                                  ||                            ||
                                  \/                            \/
                            ...........................................
      record [0] :          |  a (int ) |     b (blob address )       |
                           ...........................................
                                              ||
                                              \/
                                    ..................
                                    |   BLOB DATA    |
                                    ..................
    
                                    .................
                                    |   BLOB DATA    |
                                    ..................
                                              /\
                                              ||
                         ..........................................
      insert_values :    |  a (int ) |     b (blob address )      |
                         ..........................................
                                 /\                            /\
                                 ||                            ||
      RHS_FIELD:              VALUES(a)                     VALUES(b)
    
    After the fix, both LHS_FIELD and RHS_FIELD points to different
    BLOB DATA copies and hence solves the issue.
    05c09f61
    Bug#13901905 - CRASH: INSERT ON DUPLICATE KEY UPDATE READS FREED
    Abhishek Ranjan authored
                   MEMORY
    
    Description:
    ------------
    
    In the INSERT... UPDATE ... statement, if the statement finds
    duplicate key, update part is executed.
    If the data of blob ( or geometry ) field is updated and their
    value from the INSERT part of the statement is tried to access
    through VALUES() function then statement gives either valgrind
    error or wrong updated data or both.
    
    Analysis:
    ---------
    Let's consider the  "INSERT ... UPDATE ..." statement. When we
    try to execute this statement and "INSERT" fails due to UNIQUE KEY
    constraint violation, 'UPDATE' clause is executed.
    
    There are 3 buffers involved in execution of "INSERT...UPDATE" statement,
    
    - record[0]      : Holds the new data from 'INSERT' clause
                       OR after 'UPDATE' clause is processed.
    
    - record[1]      : Copy of previous row found in table,
                       when INSERT fails due to UNIQUE KEY violation.
    
    - insert_values  : Copy of data provided in the 'INSERT' clause,
                       when INSERT fails due to UNIQUE KEY violation.
    
    We know that in a "INSERT...UPDATE" statement, "UPDATE" clause can refer to the
    data provided in the "INSERT" clause through VALUES() function.
    
    Now, lets consider table created with statement:
    "CREATE TABLE t1 (a INT, b BLOB, UNIQUE KEY(a));"
    and INSERT...UPDATE statement,
    "INSERT INTO t1 SET b = '11',a=0
      ON DUPLICATE KEY UPDATE b = VALUES(a),a = values(b);"
    
    We have "a= VALUES(b)" expression in "UPDATE" clause. Let us call the Field* referred
    by 'a' in LHS of expression as LHS_FIELD, and Field* referring 'b' by "VALUES(b)"
    expression in RHS as RHS_FIELD.
    
    Point to be noted here is that separate RHS_FIELD objects are made for each
    VALUES() function. These RHS_FIELD objects are clones of LHS_FIELD. The buffer's
    they point to is,
    - LHS_FIELD => where its Field->ptr member points to record[0]
    - RHS_FIELD => where its Field->ptr member points to insert_values.
    
    Root cause of problem:
    -----------------------
    In general, field data is stored at Field->ptr itself. In case of BLOB Field types
    the Field->ptr points to the length and address where the BLOB DATA is located.
    
    It can be visualized graphically as:
    
    - The values supplied by "INSERT" clause of the statement are being stored as,
    
      LHS_FIELD's:
                          a                             b
                          ||                            ||
                          \/                            \/
                    ............................................
      record [0] :  |  a (int ) |     b (blob address )        |
                    ............................................
                                          ||
                                          \/
                                .........................
                                |      BLOB DATA        |
                                .........................
    
    - Now if insert statement fails, and "UPDATE" clause is executed, then
      record[0] is copied to insert_values buffer for later access.
      We can visualize it as,
    
      LHS_FIELD's:
                          a                             b
                          ||                            ||
                          \/                            \/
                    ............................................
      record [0] :  |  a (int ) |     b (blob address )        |
                    ............................................
                                          ||
                                          \/
                                .........................
                                |      BLOB DATA        |
                                .........................
                                          /\
                                          ||
                     ............................................
      insert_values: |  a (int ) |     b (blob address )        |
                     ............................................
                          /\                            /\
                          ||                            ||
      RHS_FIELD's:     VALUES(a)                     VALUES(b)
    
     Above diagram is true for even multiple references of
     'a', 'b', 'values(a)', 'values(b)'.
    
     As we see, the same BLOB DATA is pointed by both LHS_FIELD and RHS_FIELD.
     If the "UPDATE clause" changes LHS_FIELD first and then tries to access
     RHS_FIELD, then we get wrong data or valgrind error or both.
     Eg:
     - Valgrind error occurs if the memory pointed by LHS_FIELD->ptr
       has been freed and reallocated to accomodate new data.
     - Wrong value is updated, if the memory pointed by LHS_FIELD->ptr
       was not freed but reused, and new data can fit in the same memory location.
     - Both of above could happens, if the new memory is allocated at the same
       location as previous one.
    
    Fix:
    ----
     Create a separate copy of the BLOB DATA for RHS_FIELD field, when
     BLOB DATA is modified as well as accessed through VALUES() function in the
     "UPDATE" clause of the statement.
    
    - Hold LHS_FIELD and RHS_FIELD in a 'std::map'. This is used later to make
      separate copy for RHS_FIELD, if required.
    
    - Separate copy of BLOB DATA for RHS_FIELD is made, if the same field is
      being updated in "UPDATE" clause.
    
      LHS_FIELD:                  a                             b
                                  ||                            ||
                                  \/                            \/
                            ...........................................
      record [0] :          |  a (int ) |     b (blob address )       |
                           ...........................................
                                              ||
                                              \/
                                    ..................
                                    |   BLOB DATA    |
                                    ..................
    
                                    .................
                                    |   BLOB DATA    |
                                    ..................
                                              /\
                                              ||
                         ..........................................
      insert_values :    |  a (int ) |     b (blob address )      |
                         ..........................................
                                 /\                            /\
                                 ||                            ||
      RHS_FIELD:              VALUES(a)                     VALUES(b)
    
    After the fix, both LHS_FIELD and RHS_FIELD points to different
    BLOB DATA copies and hence solves the issue.
Loading