-
Sven Sandberg authored
This worklog enables the replication of small updates of big JSON documents more space-efficiently. More precisely, when using RBR, we will write only the modified parts of JSON documents, instead of the whole JSON document. The patch includes the following major components: - Implement the new option binlog_row_value_options - Implement logic to generate JSON diffs only when needed Before, JSON diffs were generated unconditionally by the optimizer. We changed so that JSON diffs are only generated when the option is enabled (unless inhibited by other options). - Implement new event type and use it when the option is enabled - Refactor: make max_row_length a private member of Row_data_memory This function was only used internally in class Row_data_memory, but was defined as a global function in table.cc. Moved it to a private member of Row_data_memory. - Refactor: simplify pack_row and unpack_row Made several refactorings in these functions, including: New utility classes for handling null bits: When reading and writing a row in a row event, the logic for iterating over fields was interleaved with low-level bit operations to maintain a bitmap of null fields. This made the code error-prone and hard to understand and edit. This refactoring encapsulates the bitmap handling in utility classes, and simplifies pack_row / unpack_row accordingly. - Refactor: add const to integer decoder functions in pack.cc Functions in mysys/pack.cc that read from a buffer did not declare the buffer as const. This patch makes net_field_length_size use a const parameter and makes other functions use const internally. Since these functions are part of the ABI, we also have to update include/mysql.h.pp. (We do not const-ify pointers-to-pointers in function declarations, since that breaks compilation on other places that call the functions using non-const arguments.) - Refactor: change Json_diff_vector from a type alias to a class This was needed because extend Json_diff_vector with more member functions. It also simplifies some forward declarations. - Refactor: do not overload global identifier TABLE in rpl_tblmap.h Class table_mapping in rpl_tblmap.h is used both in mysqlbinlog and in the server. In the server, it maps numbers to TABLE objects. In mysqlbinlog, it maps numbers to Table_map_log_event objects. This was implemented by using the type name TABLE, and in mysqlbinlog use a typedef that makes TABLE an alias for Table_map_log_event. This patch changed rpl_tblmap.h so that it does not use the identifier TABLE. Instead, it uses the new typedef Mapped_table that maps to TABLE in the server and to Table_map_log_event in mysqlbinlog. - Refactor: remove unused variable Rows_log_event::m_master_reclength There was a member variable Rows_log_event::m_master_reclength that was set to a (strange) value which was never read. Removed this. - Refactor: simplify Rows_log_event::read_write_bitmaps_cmp This member function was implemented only in the base class, but had a switch that made it execute differently depending on the instance's subclass. Changed to use a pure virtual function in the base class and implement the different logic in each subclass. - Implement encoder of new event format Outline of the pipeline: 1. In binlog.cc:Row_data_memory, take a new argument in the constructor having two 'data' pointers (this constructor is used for Update_rows_log_event and is invoked in binlog.cc:THD::binlog_update_row). This the value of the new server option binlog_row_value_options. Based on this variable, determine if Json diffs may be used, estimate how much memory will be used (using the new function json_diff.cc:Json_diff_vector::binary_length), decide if full format or partial format will be used, and adjust the allocated memory accordingly. 2. In binlog.cc:THD::binlog_update_row, pass two new arguments to pack_row: - row_image_type, which specifies if this is a Write/Update/Delete, and if it is a before-image or after-image. - value_options, which contains the value of binlog_row_value_options for update after-images. 3. In rpl_record.cc:pack_row, accept the two new arguments. If this is an update after-image and the bit in value_options is set, then determine if any column will use partial format. If any column will use partial format, write the value_options field, followed by the partial_bits, to the output. Otherwise, just write value_options=0 to the output and skip the value_options. 4. From rpl_record.cc:pack_row, invoke the new function rpl_record.cc:pack_field to write a single field. If the column is JSON and this is the after-image of an Update and the bit in value_options is set, invoke the new function field.cc:Field_json::pack_diff. Otherwise, or if field.cc:Field_json::pack_diff returned NULL, fall back to the usual non-diff writer. 5. In Field_json::pack_diff, determine again if this field will be smaller in full format or in partial format. If full format is smaller, just return NULL so that rpl_record.cc:pack_field will write the full format. Otherwise, invoke the new function json_diff.cc:Json_diff_vector::write_binary. 6. In json_diff.cc:Json_diff_vector::write_binary, write the length using 4 bytes, followed by all the diffs. Write each diff using the new function json_diff.c:Json_diff::write_binary. 7. In json_diff.c:Json_diff::write_binary, write a single diff to the output. - Implement decoder of the new format The pipeline is now: 1. Add a parameter to log_event.cc:Rows_log_event::unpack_current_row, which says if this is an after-image or not. Set the parameter from all the callers in log_event.cc. 2. Move Rows_log_event::unpack_current_row from log_event.h to log_event.cc and make it pass two new arguments to rpl_record.cc:unpack_row: row_image_type, which indicates if this is Write/Update/Delete and before-image or after-image, and has_value_options, which is true for Update events when binlog_row_value_options=PARTIAL_JSON. 3. Make rpl_record.cc:unpack_row accept the two new parameters. First make a few small refactorings in rpl_record.cc:unpack_row: - Clarify some variable names and improve the comment for the function. - Remove comments about unpack_row being used by backup, having rli==NULL. This may have been an intention at some point in time, perhaps in 5.1, but probably never was true. And rli is unconditionally dereferenced in the main loop, so it cannot be NULL. Instead assert that it is not NULL. Also assert that other parameters are not NULL, as well as other preconditions. - Improve some debug trace printouts. - Return bool instead of int since the caller does not need to distinguish more than two different return statuses. Then implement the new logic: - When partial format is enabled, read partial_bits before the after-image (from within the main loop, as well as from the loop that consumes unused fields), and also read partial_bits after the before-image (after the main loop). For the before-image, leave the read-position before the partial_bits. Use the new auxiliary function start_partial_bits_reader to read the value_options and initialize the Bit_reader accordingly, in the two places (after before-image and before after-image). - In order to read the correct number of bits before the after-image, start_partial_bits_reader needs to know the number of JSON columns on the master. This is known from the table_map_log_event via the table_def class. For convenience (and reuse in the mysqlbinlog patch), we add a member function rpl_utility.cc:table_def::json_column_count. This function also caches the computed column count, to speed up successive calls (e.g. for many-row updates). - For the before-image, set the corresponding bit in the table's read_set, for any column having a 1 in the partial_bits. This tells the engine to fetch the blob from storage (later, when the engine is invoked). The blob will be needed since we have to apply the diff on it. - Call an auxiliary function rpl_record.cc:unpack_field to read each field move some special case handling for blobs into this function too. 4. In rpl_record.cc:unpack_field, call field.cc:Field_json::unpack_field for partial Json fields. 5. Add new function field.cc:Field_json::unpack_field, which invokes the new function json_diff.cc:Json_diff_vector::read_binary to read the Json_diff_vector, and the pre-existing (since WL#10570) function apply_json_diffs to apply the diff. The Json_diff_vector uses a new MEM_ROOT rather than the one of the current_thd, because that allows memory to be freed for each value, which saves resources e.g. in case of many-row updates. Before apply_json_diff can be invoked, we need to call table->mark_column_for_partial_update and table->setup_partial_update, in order to enable the *slave* server to generate JSON diffs in the *slave's* binary log. 6. Add the new function json_diff.cc:Json_diff_vector:read_binary. This function reads the length of the field, then iterates over the diffs, reads each diff in turn, constructs Json_path and Json_wrapper/Json_dom objects, and appends them to the Json_diff_vector. We implement the auxiliary function net_field_length_checked, which reads an integer in packed format (see mysys/pack.cc), checking for out-of-bounds conditions. - Implement decoding and pretty-formatting of JSON diffs in mysqlbinlog mysqlbinlog outputs row events in two forms: - BINLOG statements that a server can apply. There is nothing to change to make this work for the new event type. - "Pseudo-SQL" that humans can read, in case mysqlbinlog is invoked with the -v flag. This is what the present patch implements. The pipeline in mysqlbinlog is: 1. log_event.cc:Rows_log_event::print_verbose invokes log_event.cc:Rows_log_event::print_verbose_one_row with the new argument row_image_type, which indicates if this is a Write/Update/Delete and whether it is a before-image or after-image. 2. In log_event.cc:log_event.cc:Rows_log_event::print_verbose_one_row we do two things: - Refactorings: - Use a Bit_reader to read the null bits, instead of using bit arithmetic. - Use safer boundary checks. The code has a pointer to row data and a pointer to the end of the row data. In C/C++, a pointer may point to the next byte after an allocated block of memory, but incrementing it further has an undefined result. After reading the length of a field, the correct way to check that this length is not corrupt is to compare it with the end pointer minus the pointer to the read position. (Before, it added the length to the read position and compared with the end pointer, but the read position plus the length is undefined.) - Implement the feature: - Read the value_options, if this is the after-image of a PARTIAL_UPDATE_ROWS_EVENT. - If value_options has the PARTIAL_JSON bit set, read the partial_bits. - Pass the partialness of the column as a parameter to log_event.cc:log_event_print_value. 3. In the new function log_event_print_value, accept the new parameter, and in case the value is partial, call the new function log_event.cc:print_json_diff to parse and print the Json diffs. 4. In the new function log_event.cc:print_json_diff, read, parse, and print all the diffs. The output has the form: JSON_<func>( JSON_<func>( ... JSON_<func>(@column, path[, value][, path [,value][, ...]]), ... path[, value][, path [,value][, ...]]), path[, value][, path [,value][, ...]]) In this output format, the JSON_<func> functions appear in *reversed* order, whereas all the (path, value) pairs appear in order of appearance. Therefore, we make two passes over the sequence of diffs: 1. Read just the operations and store them in a vector. Then print the operations in reverse order. Operations are printed using the new function log_event.cc:json_wrapper_to_string. 2. Read the full diffs and output in the order of appearance. 5. Add a new function log_event.cc:json_wrapper_to_string to print a Json_wrapper. This ensures that the Json values are printed in the correct type. JSON_<func> functions will convert SQL types to their JSON equivalents: for instance, the JSON function JSON_SET('[1, 2]', '$[0]', '[]') will set the 0th element of the JSON array to a string containing an open and closing square bracket, and not to an empty JSON array. To account for this, different data types need different quoting, and to insert a JSON object or JSON array we need to cast the string to JSON first. 6. To output JSON values with correct quoting for SQL strings, we use the existing my_b_write_quoted, but change it so that: - it uses a lookup table (computed only once) for simplicity and performance; - it prints common escapes such as \n, \\ in a more human-readable way. - BUG#26018522: MYSQLBINLOG -V PRINTS JSON IN ROW EVENTS WRONG mysqlbinlog -v had two problems: P1. It only read the length of JSON objects from two bytes. But the length of JSON data in row events is encoded (in little endian) using four bytes. Therefore, it printed the wrong data for JSON objects bigger than 64K. This also caused subsequent errors. P2. It only dumped the raw bytes of the buffer (quoted). But row events contain a binary format for JSON, so the output was not useful. We fix these two problems as follows: F1. Read the length from four bytes. F2. Link mysqlbinlog with the parts of the server that can parse binary JSON and format it in human-readable form. This includes three files: - json_binary.cc can parse the binary JSON format. - json_dom.cc can format human-readable JSON. - sql_time.cc is used by json_dom.cc to format time and date. All these files contain code that mysqlbinlog does not need and which needs to link with more parts of the server (e.g. THD). To avoid link problems we put such code inside #ifdef MYSQL_SERVER. - Created a new test suite for tests that should not be parallelized by MTR because they require many mysqlds. The new suite contains test cases requiring many (6 or more) mysqlds in a replication topology. Running those test cases with "--parallel" > 1 may exhaust the test host I/O resources. So, this new suite should run only with "--parallel=1".
Sven Sandberg authoredThis worklog enables the replication of small updates of big JSON documents more space-efficiently. More precisely, when using RBR, we will write only the modified parts of JSON documents, instead of the whole JSON document. The patch includes the following major components: - Implement the new option binlog_row_value_options - Implement logic to generate JSON diffs only when needed Before, JSON diffs were generated unconditionally by the optimizer. We changed so that JSON diffs are only generated when the option is enabled (unless inhibited by other options). - Implement new event type and use it when the option is enabled - Refactor: make max_row_length a private member of Row_data_memory This function was only used internally in class Row_data_memory, but was defined as a global function in table.cc. Moved it to a private member of Row_data_memory. - Refactor: simplify pack_row and unpack_row Made several refactorings in these functions, including: New utility classes for handling null bits: When reading and writing a row in a row event, the logic for iterating over fields was interleaved with low-level bit operations to maintain a bitmap of null fields. This made the code error-prone and hard to understand and edit. This refactoring encapsulates the bitmap handling in utility classes, and simplifies pack_row / unpack_row accordingly. - Refactor: add const to integer decoder functions in pack.cc Functions in mysys/pack.cc that read from a buffer did not declare the buffer as const. This patch makes net_field_length_size use a const parameter and makes other functions use const internally. Since these functions are part of the ABI, we also have to update include/mysql.h.pp. (We do not const-ify pointers-to-pointers in function declarations, since that breaks compilation on other places that call the functions using non-const arguments.) - Refactor: change Json_diff_vector from a type alias to a class This was needed because extend Json_diff_vector with more member functions. It also simplifies some forward declarations. - Refactor: do not overload global identifier TABLE in rpl_tblmap.h Class table_mapping in rpl_tblmap.h is used both in mysqlbinlog and in the server. In the server, it maps numbers to TABLE objects. In mysqlbinlog, it maps numbers to Table_map_log_event objects. This was implemented by using the type name TABLE, and in mysqlbinlog use a typedef that makes TABLE an alias for Table_map_log_event. This patch changed rpl_tblmap.h so that it does not use the identifier TABLE. Instead, it uses the new typedef Mapped_table that maps to TABLE in the server and to Table_map_log_event in mysqlbinlog. - Refactor: remove unused variable Rows_log_event::m_master_reclength There was a member variable Rows_log_event::m_master_reclength that was set to a (strange) value which was never read. Removed this. - Refactor: simplify Rows_log_event::read_write_bitmaps_cmp This member function was implemented only in the base class, but had a switch that made it execute differently depending on the instance's subclass. Changed to use a pure virtual function in the base class and implement the different logic in each subclass. - Implement encoder of new event format Outline of the pipeline: 1. In binlog.cc:Row_data_memory, take a new argument in the constructor having two 'data' pointers (this constructor is used for Update_rows_log_event and is invoked in binlog.cc:THD::binlog_update_row). This the value of the new server option binlog_row_value_options. Based on this variable, determine if Json diffs may be used, estimate how much memory will be used (using the new function json_diff.cc:Json_diff_vector::binary_length), decide if full format or partial format will be used, and adjust the allocated memory accordingly. 2. In binlog.cc:THD::binlog_update_row, pass two new arguments to pack_row: - row_image_type, which specifies if this is a Write/Update/Delete, and if it is a before-image or after-image. - value_options, which contains the value of binlog_row_value_options for update after-images. 3. In rpl_record.cc:pack_row, accept the two new arguments. If this is an update after-image and the bit in value_options is set, then determine if any column will use partial format. If any column will use partial format, write the value_options field, followed by the partial_bits, to the output. Otherwise, just write value_options=0 to the output and skip the value_options. 4. From rpl_record.cc:pack_row, invoke the new function rpl_record.cc:pack_field to write a single field. If the column is JSON and this is the after-image of an Update and the bit in value_options is set, invoke the new function field.cc:Field_json::pack_diff. Otherwise, or if field.cc:Field_json::pack_diff returned NULL, fall back to the usual non-diff writer. 5. In Field_json::pack_diff, determine again if this field will be smaller in full format or in partial format. If full format is smaller, just return NULL so that rpl_record.cc:pack_field will write the full format. Otherwise, invoke the new function json_diff.cc:Json_diff_vector::write_binary. 6. In json_diff.cc:Json_diff_vector::write_binary, write the length using 4 bytes, followed by all the diffs. Write each diff using the new function json_diff.c:Json_diff::write_binary. 7. In json_diff.c:Json_diff::write_binary, write a single diff to the output. - Implement decoder of the new format The pipeline is now: 1. Add a parameter to log_event.cc:Rows_log_event::unpack_current_row, which says if this is an after-image or not. Set the parameter from all the callers in log_event.cc. 2. Move Rows_log_event::unpack_current_row from log_event.h to log_event.cc and make it pass two new arguments to rpl_record.cc:unpack_row: row_image_type, which indicates if this is Write/Update/Delete and before-image or after-image, and has_value_options, which is true for Update events when binlog_row_value_options=PARTIAL_JSON. 3. Make rpl_record.cc:unpack_row accept the two new parameters. First make a few small refactorings in rpl_record.cc:unpack_row: - Clarify some variable names and improve the comment for the function. - Remove comments about unpack_row being used by backup, having rli==NULL. This may have been an intention at some point in time, perhaps in 5.1, but probably never was true. And rli is unconditionally dereferenced in the main loop, so it cannot be NULL. Instead assert that it is not NULL. Also assert that other parameters are not NULL, as well as other preconditions. - Improve some debug trace printouts. - Return bool instead of int since the caller does not need to distinguish more than two different return statuses. Then implement the new logic: - When partial format is enabled, read partial_bits before the after-image (from within the main loop, as well as from the loop that consumes unused fields), and also read partial_bits after the before-image (after the main loop). For the before-image, leave the read-position before the partial_bits. Use the new auxiliary function start_partial_bits_reader to read the value_options and initialize the Bit_reader accordingly, in the two places (after before-image and before after-image). - In order to read the correct number of bits before the after-image, start_partial_bits_reader needs to know the number of JSON columns on the master. This is known from the table_map_log_event via the table_def class. For convenience (and reuse in the mysqlbinlog patch), we add a member function rpl_utility.cc:table_def::json_column_count. This function also caches the computed column count, to speed up successive calls (e.g. for many-row updates). - For the before-image, set the corresponding bit in the table's read_set, for any column having a 1 in the partial_bits. This tells the engine to fetch the blob from storage (later, when the engine is invoked). The blob will be needed since we have to apply the diff on it. - Call an auxiliary function rpl_record.cc:unpack_field to read each field move some special case handling for blobs into this function too. 4. In rpl_record.cc:unpack_field, call field.cc:Field_json::unpack_field for partial Json fields. 5. Add new function field.cc:Field_json::unpack_field, which invokes the new function json_diff.cc:Json_diff_vector::read_binary to read the Json_diff_vector, and the pre-existing (since WL#10570) function apply_json_diffs to apply the diff. The Json_diff_vector uses a new MEM_ROOT rather than the one of the current_thd, because that allows memory to be freed for each value, which saves resources e.g. in case of many-row updates. Before apply_json_diff can be invoked, we need to call table->mark_column_for_partial_update and table->setup_partial_update, in order to enable the *slave* server to generate JSON diffs in the *slave's* binary log. 6. Add the new function json_diff.cc:Json_diff_vector:read_binary. This function reads the length of the field, then iterates over the diffs, reads each diff in turn, constructs Json_path and Json_wrapper/Json_dom objects, and appends them to the Json_diff_vector. We implement the auxiliary function net_field_length_checked, which reads an integer in packed format (see mysys/pack.cc), checking for out-of-bounds conditions. - Implement decoding and pretty-formatting of JSON diffs in mysqlbinlog mysqlbinlog outputs row events in two forms: - BINLOG statements that a server can apply. There is nothing to change to make this work for the new event type. - "Pseudo-SQL" that humans can read, in case mysqlbinlog is invoked with the -v flag. This is what the present patch implements. The pipeline in mysqlbinlog is: 1. log_event.cc:Rows_log_event::print_verbose invokes log_event.cc:Rows_log_event::print_verbose_one_row with the new argument row_image_type, which indicates if this is a Write/Update/Delete and whether it is a before-image or after-image. 2. In log_event.cc:log_event.cc:Rows_log_event::print_verbose_one_row we do two things: - Refactorings: - Use a Bit_reader to read the null bits, instead of using bit arithmetic. - Use safer boundary checks. The code has a pointer to row data and a pointer to the end of the row data. In C/C++, a pointer may point to the next byte after an allocated block of memory, but incrementing it further has an undefined result. After reading the length of a field, the correct way to check that this length is not corrupt is to compare it with the end pointer minus the pointer to the read position. (Before, it added the length to the read position and compared with the end pointer, but the read position plus the length is undefined.) - Implement the feature: - Read the value_options, if this is the after-image of a PARTIAL_UPDATE_ROWS_EVENT. - If value_options has the PARTIAL_JSON bit set, read the partial_bits. - Pass the partialness of the column as a parameter to log_event.cc:log_event_print_value. 3. In the new function log_event_print_value, accept the new parameter, and in case the value is partial, call the new function log_event.cc:print_json_diff to parse and print the Json diffs. 4. In the new function log_event.cc:print_json_diff, read, parse, and print all the diffs. The output has the form: JSON_<func>( JSON_<func>( ... JSON_<func>(@column, path[, value][, path [,value][, ...]]), ... path[, value][, path [,value][, ...]]), path[, value][, path [,value][, ...]]) In this output format, the JSON_<func> functions appear in *reversed* order, whereas all the (path, value) pairs appear in order of appearance. Therefore, we make two passes over the sequence of diffs: 1. Read just the operations and store them in a vector. Then print the operations in reverse order. Operations are printed using the new function log_event.cc:json_wrapper_to_string. 2. Read the full diffs and output in the order of appearance. 5. Add a new function log_event.cc:json_wrapper_to_string to print a Json_wrapper. This ensures that the Json values are printed in the correct type. JSON_<func> functions will convert SQL types to their JSON equivalents: for instance, the JSON function JSON_SET('[1, 2]', '$[0]', '[]') will set the 0th element of the JSON array to a string containing an open and closing square bracket, and not to an empty JSON array. To account for this, different data types need different quoting, and to insert a JSON object or JSON array we need to cast the string to JSON first. 6. To output JSON values with correct quoting for SQL strings, we use the existing my_b_write_quoted, but change it so that: - it uses a lookup table (computed only once) for simplicity and performance; - it prints common escapes such as \n, \\ in a more human-readable way. - BUG#26018522: MYSQLBINLOG -V PRINTS JSON IN ROW EVENTS WRONG mysqlbinlog -v had two problems: P1. It only read the length of JSON objects from two bytes. But the length of JSON data in row events is encoded (in little endian) using four bytes. Therefore, it printed the wrong data for JSON objects bigger than 64K. This also caused subsequent errors. P2. It only dumped the raw bytes of the buffer (quoted). But row events contain a binary format for JSON, so the output was not useful. We fix these two problems as follows: F1. Read the length from four bytes. F2. Link mysqlbinlog with the parts of the server that can parse binary JSON and format it in human-readable form. This includes three files: - json_binary.cc can parse the binary JSON format. - json_dom.cc can format human-readable JSON. - sql_time.cc is used by json_dom.cc to format time and date. All these files contain code that mysqlbinlog does not need and which needs to link with more parts of the server (e.g. THD). To avoid link problems we put such code inside #ifdef MYSQL_SERVER. - Created a new test suite for tests that should not be parallelized by MTR because they require many mysqlds. The new suite contains test cases requiring many (6 or more) mysqlds in a replication topology. Running those test cases with "--parallel" > 1 may exhaust the test host I/O resources. So, this new suite should run only with "--parallel=1".
Loading