Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7789

Optimizer should consider reversing NOT conditions with enums

    Details

    • Type: Task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Optimizer
    • Labels:
      None

      Description

      I believe this is essentially the same issue as MySQL bug #41064, or at least related to it.

      The MariaDB optimizer should, at times, reverse a condition to make better use of an index, when all possible values are known, as is the case with an enum.

      Consider the following:

      CREATE TABLE foobars (
         id_foobar int(10) unsigned NOT NULL auto_increment,
         status enum('red', 'green', 'blue', 'white') NOT NULL,
         PRIMARY KEY (id_foobar),
         KEY (status)
      ) ENGINE=InnodB;
      
      delimiter $$
      CREATE PROCEDURE populate_foobars (
         c int(10)
      )
      BEGIN
         DECLARE chance smallint(3) unsigned;
         DECLARE weighted smallint(3) unsigned;
      
         populate: LOOP
            SET chance = FLOOR(RAND() * 100);
      
            IF chance < 80 THEN
               SET weighted = 1;
            ELSEIF chance < 90 THEN
               SET weighted = 2;
            ELSEIF chance < 97 THEN
               SET weighted = 3;
            ELSE
               SET weighted = 4;
            END IF;
            
            INSERT INTO foobars
               (status)
            VALUES (weighted);
      
            SET c = c - 1;
            IF c = 0 THEN
               LEAVE populate;
            END IF;
         END LOOP populate;
      END $$
      delimiter ;
      
      CALL populate_foobars(100000);
      
      OPTIMIZE TABLE foobars;
      
      
      EXPLAIN SELECT *
      FROM foobars
      WHERE status IN ('blue', 'white');
      
      EXPLAIN SELECT *
      FROM foobars
      WHERE status NOT IN ('red', 'green');
      
      
      EXPLAIN SELECT *
      FROM foobars
      WHERE status != 'red';
      
      EXPLAIN SELECT *
      FROM foobars
      WHERE status = 'green' OR status = 'blue' OR status = 'white';
      

      In the above, the table's status enum has multiple values with an uneven distribution. 80% are red, while only 3% are white.

      Based on the results of the EXPLAIN, MariaDB can detect that IN ('blue', 'white') will examine significantly fewer rows (of the index, presumably) than NOT IN ('red', 'green'), although both options use the same index.

      I've seen cases however where it does not choose to use the index at all (probably when there's even greater weight, such as an enum with "active", "inactive", and "preview".) This is more similar to the MySQL bug report noted above. This can cause significant performance concern in larger databases.

      On my test machine, the SELECT queries in the above example take approximately 4-5 times as long when issued in their NOT IN variant, compared to the equivalent IN version. This is easiest to test with these:

      SELECT COUNT(*)
      FROM foobars
      WHERE status IN ('blue', 'white');
      
      SELECT COUNT(*)
      FROM foobars
      WHERE status NOT IN ('red', 'green');
      

      Obviously, it is easy for the user to replace NOT IN with IN, but there are semantic reasons for using NOT IN. Aside from the code possibly being more straight-forward to read, it's also more future proof - suppose I might add "yellow" to the list of colors later. Correctly using IN and NOT IN in SQL code means that this future change would have a lower risk of introducing new bugs.

      -[Unknown]

        Gliffy Diagrams

          Attachments

            Activity

            There are no comments yet on this issue.

              People

              • Assignee:
                Unassigned
                Reporter:
                unknownbrackets Unknown W. Brackets
              • Votes:
                1 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: