-
Vasil Dimov authored
The code in InnoDB that estimates the records in a given range (btr_estimate_n_rows_in_range()) is agnostic to off-by-one errors because it returns an estimate. Indeed +/- 1 is irrelevant if 4587 or 4586 is returned when the actual value is 4600 for example. This is fine - it is expected that the returned value is an approximation. The problem arises from the fact that different codepaths handle the range boundaries differently (include the boundary or not). One of those codepaths handles the case then the range fits entirely in one page and another when the range spans across multiple pages. So, for the same dataset we could get different results (+/-1) depending on the page size - if for a big page size the range fits in one page and for a smaller page size it spans across multiple pages. The solution is to tune ha_innobase::records_in_range() and btr_estimate_n_rows_in_range() to honor the type of the boundaries of the range - e.g. open, closed or unbounded from the left/right range and always return the exact number of rows in the range if all the pages in the range were sampled (see N_PAGES_READ_LIMIT). Reviewed-by:
Annamalai Gurusami <annamalai.gurusami@oracle.com> Reviewed-by: Guilhem Bichot <guilhem.bichot@oracle.com> (only */subquery.*) RB: 8034 (cherry picked from commit 4452052ea9c8ef14239ab64ceb8b2b5da2c4ea5a and commit f60a0b0b5c6d9e2608ba4caa8fc2076cfad09438 and and adjustment to opt_hints.result)
Vasil Dimov authoredThe code in InnoDB that estimates the records in a given range (btr_estimate_n_rows_in_range()) is agnostic to off-by-one errors because it returns an estimate. Indeed +/- 1 is irrelevant if 4587 or 4586 is returned when the actual value is 4600 for example. This is fine - it is expected that the returned value is an approximation. The problem arises from the fact that different codepaths handle the range boundaries differently (include the boundary or not). One of those codepaths handles the case then the range fits entirely in one page and another when the range spans across multiple pages. So, for the same dataset we could get different results (+/-1) depending on the page size - if for a big page size the range fits in one page and for a smaller page size it spans across multiple pages. The solution is to tune ha_innobase::records_in_range() and btr_estimate_n_rows_in_range() to honor the type of the boundaries of the range - e.g. open, closed or unbounded from the left/right range and always return the exact number of rows in the range if all the pages in the range were sampled (see N_PAGES_READ_LIMIT). Reviewed-by:
Annamalai Gurusami <annamalai.gurusami@oracle.com> Reviewed-by: Guilhem Bichot <guilhem.bichot@oracle.com> (only */subquery.*) RB: 8034 (cherry picked from commit 4452052ea9c8ef14239ab64ceb8b2b5da2c4ea5a and commit f60a0b0b5c6d9e2608ba4caa8fc2076cfad09438 and and adjustment to opt_hints.result)
Loading