-
Abhishek Ranjan authored
MEMORY Description: ------------ In the INSERT... UPDATE ... statement, if the statement finds duplicate key, update part is executed. If the data of blob ( or geometry ) field is updated and their value from the INSERT part of the statement is tried to access through VALUES() function then statement gives either valgrind error or wrong updated data or both. Analysis: --------- Let's consider the "INSERT ... UPDATE ..." statement. When we try to execute this statement and "INSERT" fails due to UNIQUE KEY constraint violation, 'UPDATE' clause is executed. There are 3 buffers involved in execution of "INSERT...UPDATE" statement, - record[0] : Holds the new data from 'INSERT' clause OR after 'UPDATE' clause is processed. - record[1] : Copy of previous row found in table, when INSERT fails due to UNIQUE KEY violation. - insert_values : Copy of data provided in the 'INSERT' clause, when INSERT fails due to UNIQUE KEY violation. We know that in a "INSERT...UPDATE" statement, "UPDATE" clause can refer to the data provided in the "INSERT" clause through VALUES() function. Now, lets consider table created with statement: "CREATE TABLE t1 (a INT, b BLOB, UNIQUE KEY(a));" and INSERT...UPDATE statement, "INSERT INTO t1 SET b = '11',a=0 ON DUPLICATE KEY UPDATE b = VALUES(a),a = values(b);" We have "a= VALUES(b)" expression in "UPDATE" clause. Let us call the Field* referred by 'a' in LHS of expression as LHS_FIELD, and Field* referring 'b' by "VALUES(b)" expression in RHS as RHS_FIELD. Point to be noted here is that separate RHS_FIELD objects are made for each VALUES() function. These RHS_FIELD objects are clones of LHS_FIELD. The buffer's they point to is, - LHS_FIELD => where its Field->ptr member points to record[0] - RHS_FIELD => where its Field->ptr member points to insert_values. Root cause of problem: ----------------------- In general, field data is stored at Field->ptr itself. In case of BLOB Field types the Field->ptr points to the length and address where the BLOB DATA is located. It can be visualized graphically as: - The values supplied by "INSERT" clause of the statement are being stored as, LHS_FIELD's: a b || || \/ \/ ............................................ record [0] : | a (int ) | b (blob address ) | ............................................ || \/ ......................... | BLOB DATA | ......................... - Now if insert statement fails, and "UPDATE" clause is executed, then record[0] is copied to insert_values buffer for later access. We can visualize it as, LHS_FIELD's: a b || || \/ \/ ............................................ record [0] : | a (int ) | b (blob address ) | ............................................ || \/ ......................... | BLOB DATA | ......................... /\ || ............................................ insert_values: | a (int ) | b (blob address ) | ............................................ /\ /\ || || RHS_FIELD's: VALUES(a) VALUES(b) Above diagram is true for even multiple references of 'a', 'b', 'values(a)', 'values(b)'. As we see, the same BLOB DATA is pointed by both LHS_FIELD and RHS_FIELD. If the "UPDATE clause" changes LHS_FIELD first and then tries to access RHS_FIELD, then we get wrong data or valgrind error or both. Eg: - Valgrind error occurs if the memory pointed by LHS_FIELD->ptr has been freed and reallocated to accomodate new data. - Wrong value is updated, if the memory pointed by LHS_FIELD->ptr was not freed but reused, and new data can fit in the same memory location. - Both of above could happens, if the new memory is allocated at the same location as previous one. Fix: ---- Create a separate copy of the BLOB DATA for RHS_FIELD field, when BLOB DATA is modified as well as accessed through VALUES() function in the "UPDATE" clause of the statement. - Hold LHS_FIELD and RHS_FIELD in a 'std::map'. This is used later to make separate copy for RHS_FIELD, if required. - Separate copy of BLOB DATA for RHS_FIELD is made, if the same field is being updated in "UPDATE" clause. LHS_FIELD: a b || || \/ \/ ........................................... record [0] : | a (int ) | b (blob address ) | ........................................... || \/ .................. | BLOB DATA | .................. ................. | BLOB DATA | .................. /\ || .......................................... insert_values : | a (int ) | b (blob address ) | .......................................... /\ /\ || || RHS_FIELD: VALUES(a) VALUES(b) After the fix, both LHS_FIELD and RHS_FIELD points to different BLOB DATA copies and hence solves the issue.
Abhishek Ranjan authoredMEMORY Description: ------------ In the INSERT... UPDATE ... statement, if the statement finds duplicate key, update part is executed. If the data of blob ( or geometry ) field is updated and their value from the INSERT part of the statement is tried to access through VALUES() function then statement gives either valgrind error or wrong updated data or both. Analysis: --------- Let's consider the "INSERT ... UPDATE ..." statement. When we try to execute this statement and "INSERT" fails due to UNIQUE KEY constraint violation, 'UPDATE' clause is executed. There are 3 buffers involved in execution of "INSERT...UPDATE" statement, - record[0] : Holds the new data from 'INSERT' clause OR after 'UPDATE' clause is processed. - record[1] : Copy of previous row found in table, when INSERT fails due to UNIQUE KEY violation. - insert_values : Copy of data provided in the 'INSERT' clause, when INSERT fails due to UNIQUE KEY violation. We know that in a "INSERT...UPDATE" statement, "UPDATE" clause can refer to the data provided in the "INSERT" clause through VALUES() function. Now, lets consider table created with statement: "CREATE TABLE t1 (a INT, b BLOB, UNIQUE KEY(a));" and INSERT...UPDATE statement, "INSERT INTO t1 SET b = '11',a=0 ON DUPLICATE KEY UPDATE b = VALUES(a),a = values(b);" We have "a= VALUES(b)" expression in "UPDATE" clause. Let us call the Field* referred by 'a' in LHS of expression as LHS_FIELD, and Field* referring 'b' by "VALUES(b)" expression in RHS as RHS_FIELD. Point to be noted here is that separate RHS_FIELD objects are made for each VALUES() function. These RHS_FIELD objects are clones of LHS_FIELD. The buffer's they point to is, - LHS_FIELD => where its Field->ptr member points to record[0] - RHS_FIELD => where its Field->ptr member points to insert_values. Root cause of problem: ----------------------- In general, field data is stored at Field->ptr itself. In case of BLOB Field types the Field->ptr points to the length and address where the BLOB DATA is located. It can be visualized graphically as: - The values supplied by "INSERT" clause of the statement are being stored as, LHS_FIELD's: a b || || \/ \/ ............................................ record [0] : | a (int ) | b (blob address ) | ............................................ || \/ ......................... | BLOB DATA | ......................... - Now if insert statement fails, and "UPDATE" clause is executed, then record[0] is copied to insert_values buffer for later access. We can visualize it as, LHS_FIELD's: a b || || \/ \/ ............................................ record [0] : | a (int ) | b (blob address ) | ............................................ || \/ ......................... | BLOB DATA | ......................... /\ || ............................................ insert_values: | a (int ) | b (blob address ) | ............................................ /\ /\ || || RHS_FIELD's: VALUES(a) VALUES(b) Above diagram is true for even multiple references of 'a', 'b', 'values(a)', 'values(b)'. As we see, the same BLOB DATA is pointed by both LHS_FIELD and RHS_FIELD. If the "UPDATE clause" changes LHS_FIELD first and then tries to access RHS_FIELD, then we get wrong data or valgrind error or both. Eg: - Valgrind error occurs if the memory pointed by LHS_FIELD->ptr has been freed and reallocated to accomodate new data. - Wrong value is updated, if the memory pointed by LHS_FIELD->ptr was not freed but reused, and new data can fit in the same memory location. - Both of above could happens, if the new memory is allocated at the same location as previous one. Fix: ---- Create a separate copy of the BLOB DATA for RHS_FIELD field, when BLOB DATA is modified as well as accessed through VALUES() function in the "UPDATE" clause of the statement. - Hold LHS_FIELD and RHS_FIELD in a 'std::map'. This is used later to make separate copy for RHS_FIELD, if required. - Separate copy of BLOB DATA for RHS_FIELD is made, if the same field is being updated in "UPDATE" clause. LHS_FIELD: a b || || \/ \/ ........................................... record [0] : | a (int ) | b (blob address ) | ........................................... || \/ .................. | BLOB DATA | .................. ................. | BLOB DATA | .................. /\ || .......................................... insert_values : | a (int ) | b (blob address ) | .......................................... /\ /\ || || RHS_FIELD: VALUES(a) VALUES(b) After the fix, both LHS_FIELD and RHS_FIELD points to different BLOB DATA copies and hence solves the issue.
Loading