-
Olav Sandstaa authored
INDEX STATISTICS IS MISSING The cost for doing loose index scan is calculated in cost_group_min_max(). To estimate the size of the groups to read, index statistics from rec_per_key is used. The code works correctly when we have index statistics but if index statistics is not available, there is a bug in the code that in some cases will cause wrong cost estimates to be produced. For a query like this: SELECT DISTINCT a FROM t1 WHERE b=4; with an index on (a,b,c), cost_group_min_max() will estimate the "keys per group" both for the index prefix (a) and the index prefix(a,b). The "keys per group" the first of these, is estimated by looking up in rec_per_key[0] and using the following code to adjust the estimate if there is no rec_per_key information: /* Compute the number of keys in a group. */ keys_per_group= index_info->rec_per_key[group_key_parts - 1]; if (keys_per_group == 0) /* If there is no statistics try to guess */ /* each group contains 10% of all records */ keys_per_group= (uint)(table_records / 10) + 1; For estimating the second "keys per group", rec_per_key[1] is used and the code looks like this: keys_per_subgroup= index_info->rec_per_key[used_key_parts - 1]; There is no code that handles the case when index statistics is missing. In this case keys_per_subgroup get the value 0. Using 0 as an estimate in the code that follows causes wrong cost estimates to be produced. To ensure that keys_per_subgroup always has a sensible value, this patch adds code that handles the case where index statistics is missing. A new function guess_rec_per_key() is added. This function is based on existing code in find_best_ref() that assumes for an index with multiple parts, the first key part will match 1% of the records and the whole key will match 10 records in the case of a non-unique key and 1 record in the case of a unique key. For other numbers of used key parts, a formula is used for computing an estimated rec_per_key that is between the rec_per_key value for the first key part and the rec_per_key value for the whole key. This new function is used in both cases in cost_group_min_max(). Replacing the current handling of missing index statistics causes changes to explain output in multiple test cases. The main reason for these changes are that the current code assumes that the index will have ten unique values, while the new guess_rec_per_key() function assumes that for each value in a non-unique index there are ten entries. For small tables, this results in different rec_per_key estimates.
Olav Sandstaa authoredINDEX STATISTICS IS MISSING The cost for doing loose index scan is calculated in cost_group_min_max(). To estimate the size of the groups to read, index statistics from rec_per_key is used. The code works correctly when we have index statistics but if index statistics is not available, there is a bug in the code that in some cases will cause wrong cost estimates to be produced. For a query like this: SELECT DISTINCT a FROM t1 WHERE b=4; with an index on (a,b,c), cost_group_min_max() will estimate the "keys per group" both for the index prefix (a) and the index prefix(a,b). The "keys per group" the first of these, is estimated by looking up in rec_per_key[0] and using the following code to adjust the estimate if there is no rec_per_key information: /* Compute the number of keys in a group. */ keys_per_group= index_info->rec_per_key[group_key_parts - 1]; if (keys_per_group == 0) /* If there is no statistics try to guess */ /* each group contains 10% of all records */ keys_per_group= (uint)(table_records / 10) + 1; For estimating the second "keys per group", rec_per_key[1] is used and the code looks like this: keys_per_subgroup= index_info->rec_per_key[used_key_parts - 1]; There is no code that handles the case when index statistics is missing. In this case keys_per_subgroup get the value 0. Using 0 as an estimate in the code that follows causes wrong cost estimates to be produced. To ensure that keys_per_subgroup always has a sensible value, this patch adds code that handles the case where index statistics is missing. A new function guess_rec_per_key() is added. This function is based on existing code in find_best_ref() that assumes for an index with multiple parts, the first key part will match 1% of the records and the whole key will match 10 records in the case of a non-unique key and 1 record in the case of a unique key. For other numbers of used key parts, a formula is used for computing an estimated rec_per_key that is between the rec_per_key value for the first key part and the rec_per_key value for the whole key. This new function is used in both cases in cost_group_min_max(). Replacing the current handling of missing index statistics causes changes to explain output in multiple test cases. The main reason for these changes are that the current code assumes that the index will have ten unique values, while the new guess_rec_per_key() function assumes that for each value in a non-unique index there are ten entries. For small tables, this results in different rec_per_key estimates.
Loading