Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6797

intcol=NUM is not used to remove intcol from ORDER BY if NUM is quoted

    Details

    • Sprint:
      10.1.6-2

      Description

      ORDER BY optimizer removes constant items from ORDER BY list. That is, if there is a query

      SELECT ... WHERE col1=const  ... ORDER BY col1, ...
      

      then col1 can be removed from ORDER BY list (except for some charsets).

      The problem is, this feature doesn't work for some cases where it should.
      Test dataset:

      CREATE TABLE tb_bug2 (
        pk1 int(11) NOT NULL,
        pk2 varchar(64) NOT NULL,
        col1 varchar(16) DEFAULT NULL,
        PRIMARY KEY (pk1,pk2),
        KEY key1 (pk1,col1,pk2)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
      
      INSERT INTO `tb_bug2` VALUES 
      (12321321,'a8f5f167f44f4964e6c998dee827110c','video'),
      (12321321,'d77a17a3659ffa60c54e0ea17b6c6d16','video'),
      (12321321,'wwafdsafdsafads','video'),
      (12321321,'696aa249f0738e8181957dd57c2d7d0b','video-2014-09-23'),
      (12321321,'802f9f29584b486f356693e3aa4ef0af','video=sdsd'),
      (12321321,'2f94543ff74aab82e9a058b4e8316d75','video=sdsdsds'),
      (12321321,'c1316b9df0d203fd1b9035308de52a0a','video=sdsdsdsdsd');
      

      Now, lets try two queries, one with pk1=const and the other with pk1='const':

      explain 
      SELECT pk2 FROM tb_bug2 USE INDEX(key1) 
      WHERE pk1 = 123 AND col1 = 'video' 
      ORDER BY pk1 DESC, col1 DESC, pk2 DESC LIMIT 21;
      id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
      1       SIMPLE  tb_bug2 ref     key1    key1    55      const,const     1       Using where; Using index
      
      explain 
      SELECT pk2 FROM tb_bug2 USE INDEX(key1) 
      WHERE pk1 = '123' AND col1 = 'video' 
      ORDER BY pk1 DESC, col1 DESC, pk2 DESC LIMIT 21;
      id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
      1       SIMPLE  tb_bug2 ref     key1    key1    55      const,const     1       Using where; Using index; Using filesort
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment - - edited
            • The problem is not related to not-yet-pushed order by optimization fixes, MDEV-6657 or MDEV-6402 or MDEV-6796
            • The problem is repeatable on mysql-5.6
            Show
            psergey Sergei Petrunia added a comment - - edited The problem is not related to not-yet-pushed order by optimization fixes, MDEV-6657 or MDEV-6402 or MDEV-6796 The problem is repeatable on mysql-5.6
            Hide
            psergey Sergei Petrunia added a comment -

            Note that the ref optimizer is able to make use of "pk1='123'". It's only ORDER BY optimizer that can't.

            Show
            psergey Sergei Petrunia added a comment - Note that the ref optimizer is able to make use of "pk1='123'". It's only ORDER BY optimizer that can't.
            Hide
            psergey Sergei Petrunia added a comment -

            Narrowed it down to this code:

            test_if_equality_guarantees_uniqueness(Item *l, Item *r)
            {
              return (r->const_item() || !(r->used_tables() & ~OUTER_REF_TABLE_BIT)) &&
                item_cmp_type(l->cmp_type(), r->cmp_type()) == l->cmp_type() &&
                (l->cmp_type() != STRING_RESULT ||
                 l->collation.collation == r->collation.collation);
            }
            

            ORDER BY optimizer uses it to check whether rows that satisfy "item=const" are all equal from the point of view of "ORDER BY item".

            Here,

            • l is "pk1" (integer column).
            • r is either integer constant or a string constant.
            • item_cmp_type(l, r) = DOUBLE_RESULT.

            Apparently, the restriction in this function is too strict.

            Show
            psergey Sergei Petrunia added a comment - Narrowed it down to this code: test_if_equality_guarantees_uniqueness(Item *l, Item *r) { return (r->const_item() || !(r->used_tables() & ~OUTER_REF_TABLE_BIT)) && item_cmp_type(l->cmp_type(), r->cmp_type()) == l->cmp_type() && (l->cmp_type() != STRING_RESULT || l->collation.collation == r->collation.collation); } ORDER BY optimizer uses it to check whether rows that satisfy "item=const" are all equal from the point of view of "ORDER BY item". Here, l is "pk1" (integer column). r is either integer constant or a string constant. item_cmp_type(l, r) = DOUBLE_RESULT. Apparently, the restriction in this function is too strict.
            Hide
            psergey Sergei Petrunia added a comment -

            Discussed the problem with Alexander Barkov. Came to conclusion that whenever the column's result_type is in

            { REAL_RESULT, INT_RESULT, DECIMAL_RESULT , TIME_RESULT}

            then the comparison guarantees uniqueness.

            Show
            psergey Sergei Petrunia added a comment - Discussed the problem with Alexander Barkov . Came to conclusion that whenever the column's result_type is in { REAL_RESULT, INT_RESULT, DECIMAL_RESULT , TIME_RESULT} then the comparison guarantees uniqueness.
            Hide
            psergey Sergei Petrunia added a comment - - edited

            Consider a counter-example:

            (gdb) p 0xFFFF FFFF FFFF FFFF
            $1 = 18446744073709551615

            create table t1 (a bigint unsigned);
            insert into t1 values 
            (18446744073709551000),
            (18446744073709551100),
            (18446744073709551200);
            
            select a, 
                   a='18446744073709551200',
                   a=18446744073709551200
            from t1;
            +----------------------+--------------------------+------------------------+
            | a                    | a='18446744073709551200' | a=18446744073709551200 |
            +----------------------+--------------------------+------------------------+
            | 18446744073709551000 |                        0 |                      0 |
            | 18446744073709551100 |                        0 |                      0 |
            | 18446744073709551200 |                        1 |                      1 |
            +----------------------+--------------------------+------------------------+
            

            Hmm this was supposed to be a counter example, but it's not. If the comparison a='...' has been done by comparing double values, then all values in the table would have been equal to the constant, because of insufficient precision. This didn't happen, because of this code:

             void Item_bool_func2::fix_length_and_dec()
            ...
                Make a special case of compare with fields to get nicer comparisons
                of bigint numbers with constant string.
                This directly contradicts the manual (number and a string should
                be compared as doubles), but seems to provide more
                "intuitive" behavior in some cases (but less intuitive in others).
            ...
            

            and the code here sets args[i]->cmp_context to show which type will be used for comparison.

            Show
            psergey Sergei Petrunia added a comment - - edited Consider a counter-example: (gdb) p 0xFFFF FFFF FFFF FFFF $1 = 18446744073709551615 create table t1 (a bigint unsigned); insert into t1 values (18446744073709551000), (18446744073709551100), (18446744073709551200); select a, a='18446744073709551200', a=18446744073709551200 from t1; +----------------------+--------------------------+------------------------+ | a | a='18446744073709551200' | a=18446744073709551200 | +----------------------+--------------------------+------------------------+ | 18446744073709551000 | 0 | 0 | | 18446744073709551100 | 0 | 0 | | 18446744073709551200 | 1 | 1 | +----------------------+--------------------------+------------------------+ Hmm this was supposed to be a counter example, but it's not. If the comparison a='...' has been done by comparing double values, then all values in the table would have been equal to the constant, because of insufficient precision. This didn't happen, because of this code: void Item_bool_func2::fix_length_and_dec() ... Make a special case of compare with fields to get nicer comparisons of bigint numbers with constant string. This directly contradicts the manual (number and a string should be compared as doubles), but seems to provide more "intuitive" behavior in some cases (but less intuitive in others). ... and the code here sets args [i] ->cmp_context to show which type will be used for comparison.
            Hide
            bar Alexander Barkov added a comment - - edited

            Hi Sergei,

            Discussed the problem with Alexander Barkov. Came to conclusion that whenever the column's result_type is in

            Unknown macro: { REAL_RESULT, INT_RESULT, DECIMAL_RESULT , TIME_RESULT}

            then the comparison guarantees uniqueness.

            It's cmp_type(), not result_type()!!!
            Note: temporal types (DATE, TIME, DATETIME) have resul_type()==STRING_RESULT,
            but cmp_type()==TIME_RESULT.

            Show
            bar Alexander Barkov added a comment - - edited Hi Sergei, Discussed the problem with Alexander Barkov. Came to conclusion that whenever the column's result_type is in Unknown macro: { REAL_RESULT, INT_RESULT, DECIMAL_RESULT , TIME_RESULT} then the comparison guarantees uniqueness. It's cmp_type(), not result_type()!!! Note: temporal types (DATE, TIME, DATETIME) have resul_type()==STRING_RESULT, but cmp_type()==TIME_RESULT.
            Hide
            serg Sergei Golubchik added a comment -

            Apparently, the restriction in this function is too strict.

            the restriction is formally correct. But it doesn't take into account that hack in Item_bool_func2::fix_length_and_dec().

            Perhaps, instead of having that hack and — on top of that — introducing another hack in test_if_equality_guarantees_uniqueness() it'd be cleaner to replace the string literal with a numeric literal (under the same conditions that Item_bool_func2::fix_length_and_dec() checks now).

            Show
            serg Sergei Golubchik added a comment - Apparently, the restriction in this function is too strict. the restriction is formally correct. But it doesn't take into account that hack in Item_bool_func2::fix_length_and_dec() . Perhaps, instead of having that hack and — on top of that — introducing another hack in test_if_equality_guarantees_uniqueness() it'd be cleaner to replace the string literal with a numeric literal (under the same conditions that Item_bool_func2::fix_length_and_dec() checks now).

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Agile