Skip to content
  • Guilhem Bichot's avatar
    a24950ac
    Fix for Bug#19297190 NOT IN DOESN'T RETURN EXPECTED RESULT · a24950ac
    Guilhem Bichot authored
    We have a query with WHERE ... NOT IN (SELECT ...).
    The inner field is 1503 bytes (500 chars * 3, because it has utf8 charset).
    subquery_allows_materialization() considers subq-mat:
    per the fix for Bug 17566396: MATERIALIZATION IS NOT CHOSEN FOR LONG
    UTF8 VARCHAR FIELD, this field is small enough to _not_ be changed to a BLOB
    when stored into the tmp table (Item::is_blob_field() sees that 500 <
    CONVERT_IF_BIGGER_TO_BLOB), so subq-mat is chosen. Before that fix,
    the left operand of the inequality comparison would be the length in
    _bytes_ and subq-mat would not be chosen.
    In hash_sj_engine::setup():
    create_tmp_table() uses MEMORY table with a regular unique index,
    without "unique constraint"; right after creation, setup() verifies
    that the table has no "unique constraint", otherwise subq-mat would
    fail - see comment in setup() and see how
    indexsubquery_engine::setup() does a regular index lookup of a key
    made with copy_ref_key(): to look up into "unique constraint" it would
    need to compute a checksum in copy_ref_key().
    So far so good.
    At some point during materialization, MEMORY tmp table becomes full, we go
    into create_myisam_from_heap() then create_myisam_tmp_table():
    
        if (keyinfo->key_length >= table->file->max_key_length() ||
    	keyinfo->user_defined_key_parts > table->file->max_key_parts() ||
    	share->uniques)
        {
          /* Can't create a key; Make a unique constraint instead of a key */
          share->keys=    0;
          share->uniques= 1;
          using_unique_constraint=1;
    
    alas the first condition is 1503 > 1000 (engine is MyISAM) so we
    create "unique constraint" on the MyISAM table. Subq-mat is not ready
    for this, as we know, so we get a bad result: in fact, all look ups
    then return "not found", so "NOT IN" always returns TRUE.
    
    In 5.7, subq-mat has been made able to use "unique constraint" (see
    Bug 18227171 which was part of wl 6711); however this same testcase
    still crashes but for different reasons, a separate 5.7 patch will be
    made.
    
    For fixing 5.6:
    - we cannot simply undo the fix for Bug 17566396 as there was an
    associated customer
    - on the other hand we cannot modify the logic in
    create_myisam_tmp_table() to _not_ create a "unique constraint" in
    this case (it's 5.6, we must be conservative)
    - proposal: take a middle path: make subquery_allows_materialization()
    test the length (by extending is_blob_field()): if length is >=1000
    bytes, then don't use subq-mat (because there is risk that MyISAM
    table is used and creates "unique constraint"). 1000 bytes is 333 utf8
    characters. So we would have, for the limit of # of chars:
    - before the fix for Bug 17566396: 512/3 = 170 chars,
    - after it: 512 chars
    - after the proposed fix: 332 chars.
    The testcase in Bug 17566396 used 250 chars so still uses
    subq-mat, which is good.
    a24950ac
    Fix for Bug#19297190 NOT IN DOESN'T RETURN EXPECTED RESULT
    Guilhem Bichot authored
    We have a query with WHERE ... NOT IN (SELECT ...).
    The inner field is 1503 bytes (500 chars * 3, because it has utf8 charset).
    subquery_allows_materialization() considers subq-mat:
    per the fix for Bug 17566396: MATERIALIZATION IS NOT CHOSEN FOR LONG
    UTF8 VARCHAR FIELD, this field is small enough to _not_ be changed to a BLOB
    when stored into the tmp table (Item::is_blob_field() sees that 500 <
    CONVERT_IF_BIGGER_TO_BLOB), so subq-mat is chosen. Before that fix,
    the left operand of the inequality comparison would be the length in
    _bytes_ and subq-mat would not be chosen.
    In hash_sj_engine::setup():
    create_tmp_table() uses MEMORY table with a regular unique index,
    without "unique constraint"; right after creation, setup() verifies
    that the table has no "unique constraint", otherwise subq-mat would
    fail - see comment in setup() and see how
    indexsubquery_engine::setup() does a regular index lookup of a key
    made with copy_ref_key(): to look up into "unique constraint" it would
    need to compute a checksum in copy_ref_key().
    So far so good.
    At some point during materialization, MEMORY tmp table becomes full, we go
    into create_myisam_from_heap() then create_myisam_tmp_table():
    
        if (keyinfo->key_length >= table->file->max_key_length() ||
    	keyinfo->user_defined_key_parts > table->file->max_key_parts() ||
    	share->uniques)
        {
          /* Can't create a key; Make a unique constraint instead of a key */
          share->keys=    0;
          share->uniques= 1;
          using_unique_constraint=1;
    
    alas the first condition is 1503 > 1000 (engine is MyISAM) so we
    create "unique constraint" on the MyISAM table. Subq-mat is not ready
    for this, as we know, so we get a bad result: in fact, all look ups
    then return "not found", so "NOT IN" always returns TRUE.
    
    In 5.7, subq-mat has been made able to use "unique constraint" (see
    Bug 18227171 which was part of wl 6711); however this same testcase
    still crashes but for different reasons, a separate 5.7 patch will be
    made.
    
    For fixing 5.6:
    - we cannot simply undo the fix for Bug 17566396 as there was an
    associated customer
    - on the other hand we cannot modify the logic in
    create_myisam_tmp_table() to _not_ create a "unique constraint" in
    this case (it's 5.6, we must be conservative)
    - proposal: take a middle path: make subquery_allows_materialization()
    test the length (by extending is_blob_field()): if length is >=1000
    bytes, then don't use subq-mat (because there is risk that MyISAM
    table is used and creates "unique constraint"). 1000 bytes is 333 utf8
    characters. So we would have, for the limit of # of chars:
    - before the fix for Bug 17566396: 512/3 = 170 chars,
    - after it: 512 chars
    - after the proposed fix: 332 chars.
    The testcase in Bug 17566396 used 250 chars so still uses
    subq-mat, which is good.
Loading