-
Evgeny Potemkin authored
When a user inserts JSON text that contains date/time values into a JSON field, e.g: '{"dateField":"2001-01-01"}' the date/time values are stored as JSON strings. If the user later uses e.g. JSON_CONTAINS function to compare such data to a date string: JSON_CONTAINS(j->'$.dateField', JSON_QUOTE('2001-01-01')) then those date/time values are compared as strings too, simply because there are no cast to date/time type. When user creates a multi-valued index over such a date/time field in JSON data, he actually specifies type and an appropriate temporal cast is inserted, as InnoDB can't index arbitrary JSON data. Due to this, the cast j->'$.dateField' in the example above doesn't return string anymore, rather it returns DATE/TIME, according to the index definition. This makes the comparison fail and JSON treats temporal types as being incomparable with STRING type, temporal types are always greater that any string. The initial multi-valued index implementation expected that users will be using cast to the index's type to make comparisons, e.g.: JSON_CONTAINS(j->'$.dateField', CAST(CAST('2001-01-01' AS DATE) AS JSON)) However this turned out to be not quite usable, as the comparison will fail if the index won't be used due to a string (in the document) vs a date (in the query) comparison. This isn't consistent with current non-json temporal fields behavior either: fields are perfectly well comparable to strings and to other temporal fields despite different storage format. The fix for this bug consists of two logical parts: the one which fixes JSON_OVERLAPS and JSON_CONTAINS functions, and another which fixes MEMBER OF funciton. The first one uses the fact that in order to check whether the MV index can be used, optimizer casts values to lookup to the index's type. If there are no errors during cast, then the MV's hidden generated column replaces the path expression in JSON_CONTAINS/JSON_OVERLAPS. In addition, a list of coerced values now replaces the ones provided by the user. The effect that values always match the index's type, no matter whether the index is actually used or not. To do that, gc_subst_overlaps_contains() now always uses the coerced value returned by Field_typed_array::coerce_json_value(). Collected values are stored as a constant JSON in Item_json. The latter now can print its value in form of JSON literal: json'<json value>'. The second part of patch coerces value to lookup to the MV index's type in substitute_gc_expression(). Then, the result value being wrapped into Item_json replaces the original expression's value. In order to properly generate lookup key from the coerced value, a new store_key class is added, it's named store_key_json_item. It extracts data from JSON and stores it into the ref access' key field in a manner that takes into account data type. Also in order to bring consistency between multi-valued indexes and regular GC indexes, this patch changes behavior of get_time() and get_date() methods in classes related to JSON handling. Before get_time() was doing the actual extraction of DATE/TIME/DATETIME/TIMESTAMP data from JSON and get_date() was acting like a proxy to it. Now they are separate. get_date() function extracts only DATE/DATETIME/TIMESTAMP and get_time() extracts only TIME. The reason is that results of these two aren't coercible to each other and depending on the context we need strictly either one or another. In addition, each method now can convert string data into required temporal type. This approach makes indexes based on JSON data to behave like regular indexes, when temporal values can be inserted in a string format. This might change query result, depending on the query and user's data. Change-Id: Ib1946ba4d2c7f5d92ddb9e98783e3e4d999475f4
Evgeny Potemkin authoredWhen a user inserts JSON text that contains date/time values into a JSON field, e.g: '{"dateField":"2001-01-01"}' the date/time values are stored as JSON strings. If the user later uses e.g. JSON_CONTAINS function to compare such data to a date string: JSON_CONTAINS(j->'$.dateField', JSON_QUOTE('2001-01-01')) then those date/time values are compared as strings too, simply because there are no cast to date/time type. When user creates a multi-valued index over such a date/time field in JSON data, he actually specifies type and an appropriate temporal cast is inserted, as InnoDB can't index arbitrary JSON data. Due to this, the cast j->'$.dateField' in the example above doesn't return string anymore, rather it returns DATE/TIME, according to the index definition. This makes the comparison fail and JSON treats temporal types as being incomparable with STRING type, temporal types are always greater that any string. The initial multi-valued index implementation expected that users will be using cast to the index's type to make comparisons, e.g.: JSON_CONTAINS(j->'$.dateField', CAST(CAST('2001-01-01' AS DATE) AS JSON)) However this turned out to be not quite usable, as the comparison will fail if the index won't be used due to a string (in the document) vs a date (in the query) comparison. This isn't consistent with current non-json temporal fields behavior either: fields are perfectly well comparable to strings and to other temporal fields despite different storage format. The fix for this bug consists of two logical parts: the one which fixes JSON_OVERLAPS and JSON_CONTAINS functions, and another which fixes MEMBER OF funciton. The first one uses the fact that in order to check whether the MV index can be used, optimizer casts values to lookup to the index's type. If there are no errors during cast, then the MV's hidden generated column replaces the path expression in JSON_CONTAINS/JSON_OVERLAPS. In addition, a list of coerced values now replaces the ones provided by the user. The effect that values always match the index's type, no matter whether the index is actually used or not. To do that, gc_subst_overlaps_contains() now always uses the coerced value returned by Field_typed_array::coerce_json_value(). Collected values are stored as a constant JSON in Item_json. The latter now can print its value in form of JSON literal: json'<json value>'. The second part of patch coerces value to lookup to the MV index's type in substitute_gc_expression(). Then, the result value being wrapped into Item_json replaces the original expression's value. In order to properly generate lookup key from the coerced value, a new store_key class is added, it's named store_key_json_item. It extracts data from JSON and stores it into the ref access' key field in a manner that takes into account data type. Also in order to bring consistency between multi-valued indexes and regular GC indexes, this patch changes behavior of get_time() and get_date() methods in classes related to JSON handling. Before get_time() was doing the actual extraction of DATE/TIME/DATETIME/TIMESTAMP data from JSON and get_date() was acting like a proxy to it. Now they are separate. get_date() function extracts only DATE/DATETIME/TIMESTAMP and get_time() extracts only TIME. The reason is that results of these two aren't coercible to each other and depending on the context we need strictly either one or another. In addition, each method now can convert string data into required temporal type. This approach makes indexes based on JSON data to behave like regular indexes, when temporal values can be inserted in a string format. This might change query result, depending on the query and user's data. Change-Id: Ib1946ba4d2c7f5d92ddb9e98783e3e4d999475f4
Loading