Skip to content
  • Evgeny Potemkin's avatar
    03c27d2a
    Bug#29752056: COLLECTION.FIND() FAILS FOR DATE TYPE WHEN IN IS USED · 03c27d2a
    Evgeny Potemkin authored
      When a user inserts JSON text that contains date/time values into a JSON
    field, e.g:
      '{"dateField":"2001-01-01"}'
    the date/time values are stored as JSON strings. If the user later uses e.g.
    JSON_CONTAINS function to compare such data to a date string:
      JSON_CONTAINS(j->'$.dateField', JSON_QUOTE('2001-01-01'))
    then those date/time values are compared as strings too, simply because
    there are no cast to date/time type.
      When user creates a multi-valued index over such a date/time field in JSON
    data, he actually specifies type and an appropriate temporal cast is inserted,
    as InnoDB can't index arbitrary JSON data.
      Due to this, the cast j->'$.dateField' in the example above doesn't return
    string anymore, rather it returns DATE/TIME, according to the index definition.
    This makes the comparison fail and JSON treats temporal types as being
    incomparable with STRING type, temporal types are always greater that any
    string. The initial multi-valued index implementation expected that
    users will be using cast to the index's type to make comparisons, e.g.:
      JSON_CONTAINS(j->'$.dateField', CAST(CAST('2001-01-01' AS DATE) AS JSON))
    However this turned out to be not quite usable, as the comparison will fail if
    the index won't be used due to a  string (in the document) vs a date (in the
    query) comparison. This isn't consistent with current non-json temporal
    fields behavior either: fields are perfectly well comparable to strings and to
    other temporal fields despite different storage format.
    
      The fix for this bug consists of two logical parts: the one which fixes
    JSON_OVERLAPS and JSON_CONTAINS functions, and another which fixes MEMBER OF
    funciton. The first one uses the fact that in order to check whether the MV
    index can be used, optimizer casts values to lookup to the index's type. If
    there are no errors during cast, then the MV's hidden generated column replaces
    the path expression in JSON_CONTAINS/JSON_OVERLAPS. In addition, a list of
    coerced values now replaces the ones provided by the user. The effect that
    values always match the index's type, no matter whether the index is actually
    used or not.  To do that, gc_subst_overlaps_contains() now always uses the
    coerced value returned by Field_typed_array::coerce_json_value(). Collected
    values are stored as a constant JSON in Item_json. The latter now can print
    its value in form of JSON literal: json'<json value>'.
      The second part of patch coerces value to lookup to the MV index's type in
    substitute_gc_expression(). Then, the result value being wrapped into Item_json
    replaces the original expression's value. In order to properly generate lookup
    key from the coerced value, a new store_key class is added, it's named
    store_key_json_item. It extracts data from JSON and stores it into the ref
    access' key field in a manner that takes into account data type.
    
      Also in order to bring consistency between multi-valued indexes and regular
    GC indexes, this patch changes behavior of get_time() and get_date() methods
    in classes related to JSON handling. Before get_time() was doing the actual
    extraction of DATE/TIME/DATETIME/TIMESTAMP data from JSON and get_date() was
    acting like a proxy to it. Now they are separate. get_date() function
    extracts only DATE/DATETIME/TIMESTAMP and get_time() extracts only TIME.
    The reason is that results of these two aren't coercible to each other and
    depending on the context we need strictly either one or another. In
    addition, each method now can convert string data into required temporal
    type. This approach makes indexes based on JSON data to behave like regular
    indexes, when temporal values can be inserted in a string format. This might
    change query result, depending on the query and user's data.
    
    Change-Id: Ib1946ba4d2c7f5d92ddb9e98783e3e4d999475f4
    03c27d2a
    Bug#29752056: COLLECTION.FIND() FAILS FOR DATE TYPE WHEN IN IS USED
    Evgeny Potemkin authored
      When a user inserts JSON text that contains date/time values into a JSON
    field, e.g:
      '{"dateField":"2001-01-01"}'
    the date/time values are stored as JSON strings. If the user later uses e.g.
    JSON_CONTAINS function to compare such data to a date string:
      JSON_CONTAINS(j->'$.dateField', JSON_QUOTE('2001-01-01'))
    then those date/time values are compared as strings too, simply because
    there are no cast to date/time type.
      When user creates a multi-valued index over such a date/time field in JSON
    data, he actually specifies type and an appropriate temporal cast is inserted,
    as InnoDB can't index arbitrary JSON data.
      Due to this, the cast j->'$.dateField' in the example above doesn't return
    string anymore, rather it returns DATE/TIME, according to the index definition.
    This makes the comparison fail and JSON treats temporal types as being
    incomparable with STRING type, temporal types are always greater that any
    string. The initial multi-valued index implementation expected that
    users will be using cast to the index's type to make comparisons, e.g.:
      JSON_CONTAINS(j->'$.dateField', CAST(CAST('2001-01-01' AS DATE) AS JSON))
    However this turned out to be not quite usable, as the comparison will fail if
    the index won't be used due to a  string (in the document) vs a date (in the
    query) comparison. This isn't consistent with current non-json temporal
    fields behavior either: fields are perfectly well comparable to strings and to
    other temporal fields despite different storage format.
    
      The fix for this bug consists of two logical parts: the one which fixes
    JSON_OVERLAPS and JSON_CONTAINS functions, and another which fixes MEMBER OF
    funciton. The first one uses the fact that in order to check whether the MV
    index can be used, optimizer casts values to lookup to the index's type. If
    there are no errors during cast, then the MV's hidden generated column replaces
    the path expression in JSON_CONTAINS/JSON_OVERLAPS. In addition, a list of
    coerced values now replaces the ones provided by the user. The effect that
    values always match the index's type, no matter whether the index is actually
    used or not.  To do that, gc_subst_overlaps_contains() now always uses the
    coerced value returned by Field_typed_array::coerce_json_value(). Collected
    values are stored as a constant JSON in Item_json. The latter now can print
    its value in form of JSON literal: json'<json value>'.
      The second part of patch coerces value to lookup to the MV index's type in
    substitute_gc_expression(). Then, the result value being wrapped into Item_json
    replaces the original expression's value. In order to properly generate lookup
    key from the coerced value, a new store_key class is added, it's named
    store_key_json_item. It extracts data from JSON and stores it into the ref
    access' key field in a manner that takes into account data type.
    
      Also in order to bring consistency between multi-valued indexes and regular
    GC indexes, this patch changes behavior of get_time() and get_date() methods
    in classes related to JSON handling. Before get_time() was doing the actual
    extraction of DATE/TIME/DATETIME/TIMESTAMP data from JSON and get_date() was
    acting like a proxy to it. Now they are separate. get_date() function
    extracts only DATE/DATETIME/TIMESTAMP and get_time() extracts only TIME.
    The reason is that results of these two aren't coercible to each other and
    depending on the context we need strictly either one or another. In
    addition, each method now can convert string data into required temporal
    type. This approach makes indexes based on JSON data to behave like regular
    indexes, when temporal values can be inserted in a string format. This might
    change query result, depending on the query and user's data.
    
    Change-Id: Ib1946ba4d2c7f5d92ddb9e98783e3e4d999475f4
Loading