Skip to content
  • Sven Sandberg's avatar
    6aee4693
    WL#2955: RBR replication of partial JSON updates · 6aee4693
    Sven Sandberg authored
    This worklog enables the replication of small updates of big JSON
    documents more space-efficiently.  More precisely, when using RBR, we
    will write only the modified parts of JSON documents, instead of the
    whole JSON document.
    
    The patch includes the following major components:
    
    - Implement the new option binlog_row_value_options
    
    - Implement logic to generate JSON diffs only when needed
    
      Before, JSON diffs were generated unconditionally by the optimizer.
      We changed so that JSON diffs are only generated when the option is
      enabled (unless inhibited by other options).
    
    - Implement new event type and use it when the option is enabled
    
    - Refactor: make max_row_length a private member of Row_data_memory
    
      This function was only used internally in class Row_data_memory, but
      was defined as a global function in table.cc.  Moved it to a private
      member of Row_data_memory.
    
    - Refactor: simplify pack_row and unpack_row
    
      Made several refactorings in these functions, including:
    
      New utility classes for handling null bits: When reading and writing
      a row in a row event, the logic for iterating over fields was
      interleaved with low-level bit operations to maintain a bitmap of
      null fields.  This made the code error-prone and hard to understand
      and edit.  This refactoring encapsulates the bitmap handling in
      utility classes, and simplifies pack_row / unpack_row accordingly.
    
    - Refactor: add const to integer decoder functions in pack.cc
    
      Functions in mysys/pack.cc that read from a buffer did not declare
      the buffer as const.  This patch makes net_field_length_size use a
      const parameter and makes other functions use const internally.
      Since these functions are part of the ABI, we also have to update
      include/mysql.h.pp.  (We do not const-ify pointers-to-pointers in
      function declarations, since that breaks compilation on other places
      that call the functions using non-const arguments.)
    
    - Refactor: change Json_diff_vector from a type alias to a class
    
      This was needed because extend Json_diff_vector with more member
      functions.  It also simplifies some forward declarations.
    
    - Refactor: do not overload global identifier TABLE in rpl_tblmap.h
    
      Class table_mapping in rpl_tblmap.h is used both in mysqlbinlog and
      in the server.  In the server, it maps numbers to TABLE objects.  In
      mysqlbinlog, it maps numbers to Table_map_log_event objects.  This
      was implemented by using the type name TABLE, and in mysqlbinlog use
      a typedef that makes TABLE an alias for Table_map_log_event.
    
      This patch changed rpl_tblmap.h so that it does not use the
      identifier TABLE.  Instead, it uses the new typedef Mapped_table
      that maps to TABLE in the server and to Table_map_log_event in
      mysqlbinlog.
    
    - Refactor: remove unused variable Rows_log_event::m_master_reclength
    
      There was a member variable Rows_log_event::m_master_reclength that
      was set to a (strange) value which was never read.  Removed this.
    
    - Refactor: simplify Rows_log_event::read_write_bitmaps_cmp
    
      This member function was implemented only in the base class, but had
      a switch that made it execute differently depending on the
      instance's subclass.  Changed to use a pure virtual function in the
      base class and implement the different logic in each subclass.
    
    - Implement encoder of new event format
    
      Outline of the pipeline:
    
       1. In binlog.cc:Row_data_memory, take a new argument in the
          constructor having two 'data' pointers (this constructor is used
          for Update_rows_log_event and is invoked in
          binlog.cc:THD::binlog_update_row).  This the value of the new
          server option binlog_row_value_options.  Based on this variable,
          determine if Json diffs may be used, estimate how much memory
          will be used (using the new function
          json_diff.cc:Json_diff_vector::binary_length), decide if full
          format or partial format will be used, and adjust the allocated
          memory accordingly.
    
       2. In binlog.cc:THD::binlog_update_row, pass two new arguments to
          pack_row:
    
          - row_image_type, which specifies if this is a
            Write/Update/Delete, and if it is a before-image or
            after-image.
    
          - value_options, which contains the value of
            binlog_row_value_options for update after-images.
    
       3. In rpl_record.cc:pack_row, accept the two new arguments.  If
          this is an update after-image and the bit in value_options is
          set, then determine if any column will use partial format.  If
          any column will use partial format, write the value_options
          field, followed by the partial_bits, to the output.  Otherwise,
          just write value_options=0 to the output and skip the
          value_options.
    
       4. From rpl_record.cc:pack_row, invoke the new function
          rpl_record.cc:pack_field to write a single field.  If the column
          is JSON and this is the after-image of an Update and the bit in
          value_options is set, invoke the new function
          field.cc:Field_json::pack_diff.  Otherwise, or if
          field.cc:Field_json::pack_diff returned NULL, fall back to the
          usual non-diff writer.
    
       5. In Field_json::pack_diff, determine again if this field will be
          smaller in full format or in partial format.  If full format is
          smaller, just return NULL so that rpl_record.cc:pack_field will
          write the full format.  Otherwise, invoke the new function
          json_diff.cc:Json_diff_vector::write_binary.
    
       6. In json_diff.cc:Json_diff_vector::write_binary, write the length
          using 4 bytes, followed by all the diffs.  Write each diff using
          the new function json_diff.c:Json_diff::write_binary.
    
       7. In json_diff.c:Json_diff::write_binary, write a single diff to
          the output.
    
    - Implement decoder of the new format
    
      The pipeline is now:
    
       1. Add a parameter to
          log_event.cc:Rows_log_event::unpack_current_row, which says if
          this is an after-image or not.  Set the parameter from all the
          callers in log_event.cc.
    
       2. Move Rows_log_event::unpack_current_row from log_event.h to
          log_event.cc and make it pass two new arguments to
          rpl_record.cc:unpack_row: row_image_type, which indicates if
          this is Write/Update/Delete and before-image or after-image, and
          has_value_options, which is true for Update events when
          binlog_row_value_options=PARTIAL_JSON.
    
       3. Make rpl_record.cc:unpack_row accept the two new parameters.
    
          First make a few small refactorings in rpl_record.cc:unpack_row:
    
          - Clarify some variable names and improve the comment for the
            function.
    
          - Remove comments about unpack_row being used by backup, having
            rli==NULL.  This may have been an intention at some point in
            time, perhaps in 5.1, but probably never was true.  And rli is
            unconditionally dereferenced in the main loop, so it cannot be
            NULL.  Instead assert that it is not NULL.  Also assert that
            other parameters are not NULL, as well as other preconditions.
    
          - Improve some debug trace printouts.
    
          - Return bool instead of int since the caller does not need to
            distinguish more than two different return statuses.
    
          Then implement the new logic:
    
          - When partial format is enabled, read partial_bits before the
            after-image (from within the main loop, as well as from the
            loop that consumes unused fields), and also read partial_bits
            after the before-image (after the main loop).  For the
            before-image, leave the read-position before the partial_bits.
            Use the new auxiliary function start_partial_bits_reader to
            read the value_options and initialize the Bit_reader
            accordingly, in the two places (after before-image and before
            after-image).
    
          - In order to read the correct number of bits before the
            after-image, start_partial_bits_reader needs to know the
            number of JSON columns on the master.  This is known from the
            table_map_log_event via the table_def class.  For convenience
            (and reuse in the mysqlbinlog patch), we add a member function
            rpl_utility.cc:table_def::json_column_count.  This function
            also caches the computed column count, to speed up successive
            calls (e.g. for many-row updates).
    
          - For the before-image, set the corresponding bit in the table's
            read_set, for any column having a 1 in the partial_bits.  This
            tells the engine to fetch the blob from storage (later, when
            the engine is invoked).  The blob will be needed since we have
            to apply the diff on it.
    
          - Call an auxiliary function rpl_record.cc:unpack_field to read
            each field move some special case handling for blobs into this
            function too.
    
       4. In rpl_record.cc:unpack_field, call
          field.cc:Field_json::unpack_field for partial Json fields.
    
       5. Add new function field.cc:Field_json::unpack_field, which
          invokes the new function
          json_diff.cc:Json_diff_vector::read_binary to read the
          Json_diff_vector, and the pre-existing (since WL#10570) function
          apply_json_diffs to apply the diff.
    
          The Json_diff_vector uses a new MEM_ROOT rather than the one of
          the current_thd, because that allows memory to be freed for each
          value, which saves resources e.g. in case of many-row updates.
    
          Before apply_json_diff can be invoked, we need to call
          table->mark_column_for_partial_update and
          table->setup_partial_update, in order to enable the *slave*
          server to generate JSON diffs in the *slave's* binary log.
    
       6. Add the new function json_diff.cc:Json_diff_vector:read_binary.
          This function reads the length of the field, then iterates over
          the diffs, reads each diff in turn, constructs Json_path and
          Json_wrapper/Json_dom objects, and appends them to the
          Json_diff_vector.
    
          We implement the auxiliary function net_field_length_checked,
          which reads an integer in packed format (see mysys/pack.cc),
          checking for out-of-bounds conditions.
    
    - Implement decoding and pretty-formatting of JSON diffs in mysqlbinlog
    
      mysqlbinlog outputs row events in two forms:
    
      - BINLOG statements that a server can apply.  There is nothing to
        change to make this work for the new event type.
      - "Pseudo-SQL" that humans can read, in case mysqlbinlog is invoked
        with the -v flag.  This is what the present patch implements.
    
      The pipeline in mysqlbinlog is:
    
       1. log_event.cc:Rows_log_event::print_verbose invokes
          log_event.cc:Rows_log_event::print_verbose_one_row with the new
          argument row_image_type, which indicates if this is a
          Write/Update/Delete and whether it is a before-image or
          after-image.
    
       2. In log_event.cc:log_event.cc:Rows_log_event::print_verbose_one_row
          we do two things:
    
          - Refactorings:
    
            - Use a Bit_reader to read the null bits, instead of using bit
              arithmetic.
    
            - Use safer boundary checks.  The code has a pointer to row
              data and a pointer to the end of the row data.  In C/C++, a
              pointer may point to the next byte after an allocated block
              of memory, but incrementing it further has an undefined
              result.  After reading the length of a field, the correct
              way to check that this length is not corrupt is to compare
              it with the end pointer minus the pointer to the read
              position.  (Before, it added the length to the read position
              and compared with the end pointer, but the read position
              plus the length is undefined.)
    
          - Implement the feature:
    
            - Read the value_options, if this is the after-image of a
              PARTIAL_UPDATE_ROWS_EVENT.
    
            - If value_options has the PARTIAL_JSON bit set, read the
              partial_bits.
    
            - Pass the partialness of the column as a parameter to
              log_event.cc:log_event_print_value.
    
       3. In the new function log_event_print_value, accept the new
          parameter, and in case the value is partial, call the new
          function log_event.cc:print_json_diff to parse and print the
          Json diffs.
    
       4. In the new function log_event.cc:print_json_diff, read, parse,
          and print all the diffs.
    
          The output has the form:
            JSON_<func>(
            JSON_<func>(
            ...
            JSON_<func>(@column, path[, value][,
                        path [,value][,
                        ...]]),
            ...
                        path[, value][,
                        path [,value][,
                        ...]]),
                        path[, value][,
                        path [,value][,
                        ...]])
    
          In this output format, the JSON_<func> functions appear in
          *reversed* order, whereas all the (path, value) pairs appear in
          order of appearance.  Therefore, we make two passes over the
          sequence of diffs:
    
           1. Read just the operations and store them in a vector.  Then
              print the operations in reverse order. Operations are
              printed using the new function
              log_event.cc:json_wrapper_to_string.
    
           2. Read the full diffs and output in the order of appearance.
    
       5. Add a new function log_event.cc:json_wrapper_to_string to print
          a Json_wrapper.  This ensures that the Json values are printed
          in the correct type.  JSON_<func> functions will convert SQL
          types to their JSON equivalents: for instance, the JSON function
          JSON_SET('[1, 2]', '$[0]', '[]') will set the 0th element of the
          JSON array to a string containing an open and closing square
          bracket, and not to an empty JSON array.  To account for this,
          different data types need different quoting, and to insert a
          JSON object or JSON array we need to cast the string to JSON
          first.
    
       6. To output JSON values with correct quoting for SQL strings, we use
          the existing my_b_write_quoted, but change it so that:
    
          - it uses a lookup table (computed only once) for simplicity and
            performance;
    
          - it prints common escapes such as \n, \\ in a more
            human-readable way.
    
    - BUG#26018522: MYSQLBINLOG -V PRINTS JSON IN ROW EVENTS WRONG
      mysqlbinlog -v had two problems:
    
      P1. It only read the length of JSON objects from two bytes. But the
          length of JSON data in row events is encoded (in little endian)
          using four bytes.  Therefore, it printed the wrong data for JSON
          objects bigger than 64K.  This also caused subsequent errors.
    
      P2. It only dumped the raw bytes of the buffer (quoted).  But row
          events contain a binary format for JSON, so the output was not
          useful.
    
      We fix these two problems as follows:
    
      F1. Read the length from four bytes.
    
      F2. Link mysqlbinlog with the parts of the server that can parse
          binary JSON and format it in human-readable form.  This includes
          three files:
          - json_binary.cc can parse the binary JSON format.
          - json_dom.cc can format human-readable JSON.
          - sql_time.cc is used by json_dom.cc to format time and date.
          All these files contain code that mysqlbinlog does not need and
          which needs to link with more parts of the server (e.g. THD).  To
          avoid link problems we put such code inside #ifdef MYSQL_SERVER.
    
    - Created a new test suite for tests that should not be
      parallelized by MTR because they require many mysqlds.
    
      The new suite contains test cases requiring many (6 or more) mysqlds
      in a replication topology. Running those test cases with
      "--parallel" > 1 may exhaust the test host I/O resources. So, this
      new suite should run only with "--parallel=1".
    6aee4693
    WL#2955: RBR replication of partial JSON updates
    Sven Sandberg authored
    This worklog enables the replication of small updates of big JSON
    documents more space-efficiently.  More precisely, when using RBR, we
    will write only the modified parts of JSON documents, instead of the
    whole JSON document.
    
    The patch includes the following major components:
    
    - Implement the new option binlog_row_value_options
    
    - Implement logic to generate JSON diffs only when needed
    
      Before, JSON diffs were generated unconditionally by the optimizer.
      We changed so that JSON diffs are only generated when the option is
      enabled (unless inhibited by other options).
    
    - Implement new event type and use it when the option is enabled
    
    - Refactor: make max_row_length a private member of Row_data_memory
    
      This function was only used internally in class Row_data_memory, but
      was defined as a global function in table.cc.  Moved it to a private
      member of Row_data_memory.
    
    - Refactor: simplify pack_row and unpack_row
    
      Made several refactorings in these functions, including:
    
      New utility classes for handling null bits: When reading and writing
      a row in a row event, the logic for iterating over fields was
      interleaved with low-level bit operations to maintain a bitmap of
      null fields.  This made the code error-prone and hard to understand
      and edit.  This refactoring encapsulates the bitmap handling in
      utility classes, and simplifies pack_row / unpack_row accordingly.
    
    - Refactor: add const to integer decoder functions in pack.cc
    
      Functions in mysys/pack.cc that read from a buffer did not declare
      the buffer as const.  This patch makes net_field_length_size use a
      const parameter and makes other functions use const internally.
      Since these functions are part of the ABI, we also have to update
      include/mysql.h.pp.  (We do not const-ify pointers-to-pointers in
      function declarations, since that breaks compilation on other places
      that call the functions using non-const arguments.)
    
    - Refactor: change Json_diff_vector from a type alias to a class
    
      This was needed because extend Json_diff_vector with more member
      functions.  It also simplifies some forward declarations.
    
    - Refactor: do not overload global identifier TABLE in rpl_tblmap.h
    
      Class table_mapping in rpl_tblmap.h is used both in mysqlbinlog and
      in the server.  In the server, it maps numbers to TABLE objects.  In
      mysqlbinlog, it maps numbers to Table_map_log_event objects.  This
      was implemented by using the type name TABLE, and in mysqlbinlog use
      a typedef that makes TABLE an alias for Table_map_log_event.
    
      This patch changed rpl_tblmap.h so that it does not use the
      identifier TABLE.  Instead, it uses the new typedef Mapped_table
      that maps to TABLE in the server and to Table_map_log_event in
      mysqlbinlog.
    
    - Refactor: remove unused variable Rows_log_event::m_master_reclength
    
      There was a member variable Rows_log_event::m_master_reclength that
      was set to a (strange) value which was never read.  Removed this.
    
    - Refactor: simplify Rows_log_event::read_write_bitmaps_cmp
    
      This member function was implemented only in the base class, but had
      a switch that made it execute differently depending on the
      instance's subclass.  Changed to use a pure virtual function in the
      base class and implement the different logic in each subclass.
    
    - Implement encoder of new event format
    
      Outline of the pipeline:
    
       1. In binlog.cc:Row_data_memory, take a new argument in the
          constructor having two 'data' pointers (this constructor is used
          for Update_rows_log_event and is invoked in
          binlog.cc:THD::binlog_update_row).  This the value of the new
          server option binlog_row_value_options.  Based on this variable,
          determine if Json diffs may be used, estimate how much memory
          will be used (using the new function
          json_diff.cc:Json_diff_vector::binary_length), decide if full
          format or partial format will be used, and adjust the allocated
          memory accordingly.
    
       2. In binlog.cc:THD::binlog_update_row, pass two new arguments to
          pack_row:
    
          - row_image_type, which specifies if this is a
            Write/Update/Delete, and if it is a before-image or
            after-image.
    
          - value_options, which contains the value of
            binlog_row_value_options for update after-images.
    
       3. In rpl_record.cc:pack_row, accept the two new arguments.  If
          this is an update after-image and the bit in value_options is
          set, then determine if any column will use partial format.  If
          any column will use partial format, write the value_options
          field, followed by the partial_bits, to the output.  Otherwise,
          just write value_options=0 to the output and skip the
          value_options.
    
       4. From rpl_record.cc:pack_row, invoke the new function
          rpl_record.cc:pack_field to write a single field.  If the column
          is JSON and this is the after-image of an Update and the bit in
          value_options is set, invoke the new function
          field.cc:Field_json::pack_diff.  Otherwise, or if
          field.cc:Field_json::pack_diff returned NULL, fall back to the
          usual non-diff writer.
    
       5. In Field_json::pack_diff, determine again if this field will be
          smaller in full format or in partial format.  If full format is
          smaller, just return NULL so that rpl_record.cc:pack_field will
          write the full format.  Otherwise, invoke the new function
          json_diff.cc:Json_diff_vector::write_binary.
    
       6. In json_diff.cc:Json_diff_vector::write_binary, write the length
          using 4 bytes, followed by all the diffs.  Write each diff using
          the new function json_diff.c:Json_diff::write_binary.
    
       7. In json_diff.c:Json_diff::write_binary, write a single diff to
          the output.
    
    - Implement decoder of the new format
    
      The pipeline is now:
    
       1. Add a parameter to
          log_event.cc:Rows_log_event::unpack_current_row, which says if
          this is an after-image or not.  Set the parameter from all the
          callers in log_event.cc.
    
       2. Move Rows_log_event::unpack_current_row from log_event.h to
          log_event.cc and make it pass two new arguments to
          rpl_record.cc:unpack_row: row_image_type, which indicates if
          this is Write/Update/Delete and before-image or after-image, and
          has_value_options, which is true for Update events when
          binlog_row_value_options=PARTIAL_JSON.
    
       3. Make rpl_record.cc:unpack_row accept the two new parameters.
    
          First make a few small refactorings in rpl_record.cc:unpack_row:
    
          - Clarify some variable names and improve the comment for the
            function.
    
          - Remove comments about unpack_row being used by backup, having
            rli==NULL.  This may have been an intention at some point in
            time, perhaps in 5.1, but probably never was true.  And rli is
            unconditionally dereferenced in the main loop, so it cannot be
            NULL.  Instead assert that it is not NULL.  Also assert that
            other parameters are not NULL, as well as other preconditions.
    
          - Improve some debug trace printouts.
    
          - Return bool instead of int since the caller does not need to
            distinguish more than two different return statuses.
    
          Then implement the new logic:
    
          - When partial format is enabled, read partial_bits before the
            after-image (from within the main loop, as well as from the
            loop that consumes unused fields), and also read partial_bits
            after the before-image (after the main loop).  For the
            before-image, leave the read-position before the partial_bits.
            Use the new auxiliary function start_partial_bits_reader to
            read the value_options and initialize the Bit_reader
            accordingly, in the two places (after before-image and before
            after-image).
    
          - In order to read the correct number of bits before the
            after-image, start_partial_bits_reader needs to know the
            number of JSON columns on the master.  This is known from the
            table_map_log_event via the table_def class.  For convenience
            (and reuse in the mysqlbinlog patch), we add a member function
            rpl_utility.cc:table_def::json_column_count.  This function
            also caches the computed column count, to speed up successive
            calls (e.g. for many-row updates).
    
          - For the before-image, set the corresponding bit in the table's
            read_set, for any column having a 1 in the partial_bits.  This
            tells the engine to fetch the blob from storage (later, when
            the engine is invoked).  The blob will be needed since we have
            to apply the diff on it.
    
          - Call an auxiliary function rpl_record.cc:unpack_field to read
            each field move some special case handling for blobs into this
            function too.
    
       4. In rpl_record.cc:unpack_field, call
          field.cc:Field_json::unpack_field for partial Json fields.
    
       5. Add new function field.cc:Field_json::unpack_field, which
          invokes the new function
          json_diff.cc:Json_diff_vector::read_binary to read the
          Json_diff_vector, and the pre-existing (since WL#10570) function
          apply_json_diffs to apply the diff.
    
          The Json_diff_vector uses a new MEM_ROOT rather than the one of
          the current_thd, because that allows memory to be freed for each
          value, which saves resources e.g. in case of many-row updates.
    
          Before apply_json_diff can be invoked, we need to call
          table->mark_column_for_partial_update and
          table->setup_partial_update, in order to enable the *slave*
          server to generate JSON diffs in the *slave's* binary log.
    
       6. Add the new function json_diff.cc:Json_diff_vector:read_binary.
          This function reads the length of the field, then iterates over
          the diffs, reads each diff in turn, constructs Json_path and
          Json_wrapper/Json_dom objects, and appends them to the
          Json_diff_vector.
    
          We implement the auxiliary function net_field_length_checked,
          which reads an integer in packed format (see mysys/pack.cc),
          checking for out-of-bounds conditions.
    
    - Implement decoding and pretty-formatting of JSON diffs in mysqlbinlog
    
      mysqlbinlog outputs row events in two forms:
    
      - BINLOG statements that a server can apply.  There is nothing to
        change to make this work for the new event type.
      - "Pseudo-SQL" that humans can read, in case mysqlbinlog is invoked
        with the -v flag.  This is what the present patch implements.
    
      The pipeline in mysqlbinlog is:
    
       1. log_event.cc:Rows_log_event::print_verbose invokes
          log_event.cc:Rows_log_event::print_verbose_one_row with the new
          argument row_image_type, which indicates if this is a
          Write/Update/Delete and whether it is a before-image or
          after-image.
    
       2. In log_event.cc:log_event.cc:Rows_log_event::print_verbose_one_row
          we do two things:
    
          - Refactorings:
    
            - Use a Bit_reader to read the null bits, instead of using bit
              arithmetic.
    
            - Use safer boundary checks.  The code has a pointer to row
              data and a pointer to the end of the row data.  In C/C++, a
              pointer may point to the next byte after an allocated block
              of memory, but incrementing it further has an undefined
              result.  After reading the length of a field, the correct
              way to check that this length is not corrupt is to compare
              it with the end pointer minus the pointer to the read
              position.  (Before, it added the length to the read position
              and compared with the end pointer, but the read position
              plus the length is undefined.)
    
          - Implement the feature:
    
            - Read the value_options, if this is the after-image of a
              PARTIAL_UPDATE_ROWS_EVENT.
    
            - If value_options has the PARTIAL_JSON bit set, read the
              partial_bits.
    
            - Pass the partialness of the column as a parameter to
              log_event.cc:log_event_print_value.
    
       3. In the new function log_event_print_value, accept the new
          parameter, and in case the value is partial, call the new
          function log_event.cc:print_json_diff to parse and print the
          Json diffs.
    
       4. In the new function log_event.cc:print_json_diff, read, parse,
          and print all the diffs.
    
          The output has the form:
            JSON_<func>(
            JSON_<func>(
            ...
            JSON_<func>(@column, path[, value][,
                        path [,value][,
                        ...]]),
            ...
                        path[, value][,
                        path [,value][,
                        ...]]),
                        path[, value][,
                        path [,value][,
                        ...]])
    
          In this output format, the JSON_<func> functions appear in
          *reversed* order, whereas all the (path, value) pairs appear in
          order of appearance.  Therefore, we make two passes over the
          sequence of diffs:
    
           1. Read just the operations and store them in a vector.  Then
              print the operations in reverse order. Operations are
              printed using the new function
              log_event.cc:json_wrapper_to_string.
    
           2. Read the full diffs and output in the order of appearance.
    
       5. Add a new function log_event.cc:json_wrapper_to_string to print
          a Json_wrapper.  This ensures that the Json values are printed
          in the correct type.  JSON_<func> functions will convert SQL
          types to their JSON equivalents: for instance, the JSON function
          JSON_SET('[1, 2]', '$[0]', '[]') will set the 0th element of the
          JSON array to a string containing an open and closing square
          bracket, and not to an empty JSON array.  To account for this,
          different data types need different quoting, and to insert a
          JSON object or JSON array we need to cast the string to JSON
          first.
    
       6. To output JSON values with correct quoting for SQL strings, we use
          the existing my_b_write_quoted, but change it so that:
    
          - it uses a lookup table (computed only once) for simplicity and
            performance;
    
          - it prints common escapes such as \n, \\ in a more
            human-readable way.
    
    - BUG#26018522: MYSQLBINLOG -V PRINTS JSON IN ROW EVENTS WRONG
      mysqlbinlog -v had two problems:
    
      P1. It only read the length of JSON objects from two bytes. But the
          length of JSON data in row events is encoded (in little endian)
          using four bytes.  Therefore, it printed the wrong data for JSON
          objects bigger than 64K.  This also caused subsequent errors.
    
      P2. It only dumped the raw bytes of the buffer (quoted).  But row
          events contain a binary format for JSON, so the output was not
          useful.
    
      We fix these two problems as follows:
    
      F1. Read the length from four bytes.
    
      F2. Link mysqlbinlog with the parts of the server that can parse
          binary JSON and format it in human-readable form.  This includes
          three files:
          - json_binary.cc can parse the binary JSON format.
          - json_dom.cc can format human-readable JSON.
          - sql_time.cc is used by json_dom.cc to format time and date.
          All these files contain code that mysqlbinlog does not need and
          which needs to link with more parts of the server (e.g. THD).  To
          avoid link problems we put such code inside #ifdef MYSQL_SERVER.
    
    - Created a new test suite for tests that should not be
      parallelized by MTR because they require many mysqlds.
    
      The new suite contains test cases requiring many (6 or more) mysqlds
      in a replication topology. Running those test cases with
      "--parallel" > 1 may exhaust the test host I/O resources. So, this
      new suite should run only with "--parallel=1".
Loading