Skip to content
  • Olav Sandstaa's avatar
    0670d0b9
    Fix for Bug#18497308 WRONG COST ESTIMATE FOR LOOSE INDEX SCAN WHEN · 0670d0b9
    Olav Sandstaa authored
                         INDEX STATISTICS IS MISSING 
    
    The cost for doing loose index scan is calculated in
    cost_group_min_max(). To estimate the size of the groups to read,
    index statistics from rec_per_key is used. The code works correctly
    when we have index statistics but if index statistics is not
    available, there is a bug in the code that in some cases will cause
    wrong cost estimates to be produced.
    
    For a query like this:
    
      SELECT DISTINCT a FROM t1 WHERE b=4;
    
    with an index on (a,b,c), cost_group_min_max() will estimate the "keys
    per group" both for the index prefix (a) and the index
    prefix(a,b). The "keys per group" the first of these, is estimated by
    looking up in rec_per_key[0] and using the following code to adjust
    the estimate if there is no rec_per_key information:
    
      /* Compute the number of keys in a group. */
      keys_per_group= index_info->rec_per_key[group_key_parts - 1];
      if (keys_per_group == 0) /* If there is no statistics try to guess */
        /* each group contains 10% of all records */
        keys_per_group= (uint)(table_records / 10) + 1;
    
    For estimating the second "keys per group", rec_per_key[1] is used and
    the code looks like this:
    
      keys_per_subgroup= index_info->rec_per_key[used_key_parts - 1];
    
    There is no code that handles the case when index statistics is
    missing. In this case keys_per_subgroup get the value 0. Using 0 as an
    estimate in the code that follows causes wrong cost estimates to be
    produced.
    
    To ensure that keys_per_subgroup always has a sensible value, this
    patch adds code that handles the case where index statistics is
    missing. A new function guess_rec_per_key() is added. This function
    is based on existing code in find_best_ref() that assumes for an
    index with multiple parts, the first key part will match 1% of the
    records and the whole key will match 10 records in the case of
    a non-unique key and 1 record in the case of a unique key. For
    other numbers of used key parts, a formula is used for computing
    an estimated rec_per_key that is between the rec_per_key value for
    the first key part and the rec_per_key value for the whole key.
    
    This new function is used in both cases in cost_group_min_max().
    Replacing the current handling of missing index statistics causes
    changes to explain output in multiple test cases. The main reason
    for these changes are that the current code assumes that the
    index will have ten unique values, while the new
    guess_rec_per_key() function assumes that for each value in a
    non-unique index there are ten entries. For small tables, this
    results in different rec_per_key estimates.
    0670d0b9
    Fix for Bug#18497308 WRONG COST ESTIMATE FOR LOOSE INDEX SCAN WHEN
    Olav Sandstaa authored
                         INDEX STATISTICS IS MISSING 
    
    The cost for doing loose index scan is calculated in
    cost_group_min_max(). To estimate the size of the groups to read,
    index statistics from rec_per_key is used. The code works correctly
    when we have index statistics but if index statistics is not
    available, there is a bug in the code that in some cases will cause
    wrong cost estimates to be produced.
    
    For a query like this:
    
      SELECT DISTINCT a FROM t1 WHERE b=4;
    
    with an index on (a,b,c), cost_group_min_max() will estimate the "keys
    per group" both for the index prefix (a) and the index
    prefix(a,b). The "keys per group" the first of these, is estimated by
    looking up in rec_per_key[0] and using the following code to adjust
    the estimate if there is no rec_per_key information:
    
      /* Compute the number of keys in a group. */
      keys_per_group= index_info->rec_per_key[group_key_parts - 1];
      if (keys_per_group == 0) /* If there is no statistics try to guess */
        /* each group contains 10% of all records */
        keys_per_group= (uint)(table_records / 10) + 1;
    
    For estimating the second "keys per group", rec_per_key[1] is used and
    the code looks like this:
    
      keys_per_subgroup= index_info->rec_per_key[used_key_parts - 1];
    
    There is no code that handles the case when index statistics is
    missing. In this case keys_per_subgroup get the value 0. Using 0 as an
    estimate in the code that follows causes wrong cost estimates to be
    produced.
    
    To ensure that keys_per_subgroup always has a sensible value, this
    patch adds code that handles the case where index statistics is
    missing. A new function guess_rec_per_key() is added. This function
    is based on existing code in find_best_ref() that assumes for an
    index with multiple parts, the first key part will match 1% of the
    records and the whole key will match 10 records in the case of
    a non-unique key and 1 record in the case of a unique key. For
    other numbers of used key parts, a formula is used for computing
    an estimated rec_per_key that is between the rec_per_key value for
    the first key part and the rec_per_key value for the whole key.
    
    This new function is used in both cases in cost_group_min_max().
    Replacing the current handling of missing index statistics causes
    changes to explain output in multiple test cases. The main reason
    for these changes are that the current code assumes that the
    index will have ten unique values, while the new
    guess_rec_per_key() function assumes that for each value in a
    non-unique index there are ten entries. For small tables, this
    results in different rec_per_key estimates.
Loading