Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6299

Query optimizer (?) fails - full table scan for Integer PK select

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: 5.5.36
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Fedora linux

      Description

      When sending query - simple select by unique integer id that is primary key results in full table scan.

      Table structure:

      CREATE TABLE `targeting_data_testing` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `_hash` varchar(30) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
        `__last_used` date NOT NULL,
        PRIMARY KEY (`id`,`__last_used`),
        UNIQUE KEY `_hash` (`_hash`,`__last_used`) USING BTREE
      ) ENGINE=Aria AUTO_INCREMENT=116938292 DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1 TRANSACTIONAL=0;
      

      Sample table attached. Below is query plan for following queries

      Full table scan:

      SELECT * FROM `targeting_data_testing` WHERE id=('6E76XdQUK+es/GqzbfEPm4dbHpA') OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')
      SELECT * FROM `targeting_data_testing` WHERE id=('6E76') OR _hash= ('6E76XdQUK+es/GqzbfEPm4dbHpA')
      

      Using index:

      SELECT * FROM `targeting_data_testing` WHERE id=('6E10'+0) OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')
      SELECT * FROM `targeting_data_testing` WHERE id=6E10 OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')
      

      So the server needs to scan the whole table if we SELECT by large xEy formatted number and we pass that number with quotes. Very serious and very hard to prevent bug (as it's very counter-intuitive that just adding quotes will make such a big difference, and only some base64 encoded strings will trigger the bug).

      This lead to scanning 2 billion rows table on some occasions, on our server.

      When querying only by id it was like 5 million rows scan VS "Impossible WHERE noticed after reading const tables" on that 2 billion row table.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            pslawek83 Slawomir Pryczek added a comment - - edited

            Interesting thing
            "6E10" is having same effect (full table scan)
            While "10000000" , 10000000 , 6E10 is scaning only 1 row / no rows...

            So to optimizer "6E10" is neither 6E10 number nor "10000000", 10000000?

            Show
            pslawek83 Slawomir Pryczek added a comment - - edited Interesting thing "6E10" is having same effect (full table scan) While "10000000" , 10000000 , 6E10 is scaning only 1 row / no rows... So to optimizer "6E10" is neither 6E10 number nor "10000000", 10000000?
            Hide
            serg Sergei Golubchik added a comment -

            This is, in fact, intentional and not a bug. When you compare an integer to a string, they're both being casted and compared as floating-point numbers. That is, your query

            SELECT * FROM targeting_data_testing WHERE id='6E76Xd...'
            

            is internally executed as

            SELECT * FROM targeting_data_testing WHERE CAST(id AS DOUBLE)=CAST('6E76Xd...' AS DOUBLE)
            

            And if you write it like that, it becomes obvious that no index can help, if an indexed column is used as an argument for a function.

            To be sure than an index will be considered, cast the value to the data type of your indexed column, for example, write

            SELECT * FROM targeting_data_testing WHERE id=CAST('6E76Xd...' AS SIGNED)
            
            Show
            serg Sergei Golubchik added a comment - This is, in fact, intentional and not a bug. When you compare an integer to a string, they're both being casted and compared as floating-point numbers. That is, your query SELECT * FROM targeting_data_testing WHERE id='6E76Xd...' is internally executed as SELECT * FROM targeting_data_testing WHERE CAST(id AS DOUBLE)=CAST('6E76Xd...' AS DOUBLE) And if you write it like that, it becomes obvious that no index can help, if an indexed column is used as an argument for a function. To be sure than an index will be considered, cast the value to the data type of your indexed column, for example, write SELECT * FROM targeting_data_testing WHERE id=CAST('6E76Xd...' AS SIGNED)

              People

              • Assignee:
                Unassigned
                Reporter:
                pslawek83 Slawomir Pryczek
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: