Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4595

GROUP BY in subqueries removed too aggressively

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 10.0.2, 5.5.31, 5.3.12
    • Fix Version/s: 10.0, 5.5
    • Component/s: None
    • Labels:
      None

      Description

      Since 5.3 MariaDB supports this optimization:

      This optimization, although correct in strict SQL standard sense, does not always work when the subquery uses non-aggregated columns in the SELECT clause. This is non-standard MariaDB extension, but as long as it is supported, the optimizer should take it into account.

      MariaDB [test]> create table t1 (a int, b int);
      Query OK, 0 rows affected (0.00 sec)
      
      MariaDB [test]> insert t1 values (1,1),(2,1);
      Query OK, 2 rows affected (0.00 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      
      MariaDB [test]> select a from t1 group by b;
      +------+
      | a    |
      +------+
      |    1 |
      +------+
      1 row in set (0.01 sec)
      
      MariaDB [test]> select count(*) from t1 where a in (select a from t1);
      +----------+
      | count(*) |
      +----------+
      |        2 |
      +----------+
      1 row in set (0.00 sec)
      
      MariaDB [test]> select count(*) from t1 where a in (select a from t1 group by b);
      +----------+
      | count(*) |
      +----------+
      |        2 |   <-- this is wrong!
      +----------+
      1 row in set (0.00 sec)
      
      MariaDB [test]> select count(*) from t1 where a in (select min(a) from t1 group by b);
      +----------+
      | count(*) |
      +----------+
      |        1 |
      +----------+
      1 row in set (0.00 sec)
      

      As a possible fix, the optimizer could disable this optimization when non-aggregated columns are selected. It already performs all necessary checks for the ONLY_FULL_GROUP_BY sql mode.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            The manual says http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html "The server is free to choose any value from each group".

            Let's remember that and look at the query and dataset from this bugreport:

            select a from t1 group by b

            a=1 is a valid result. a=2 is also a valid result.

            Then, lets consider a query:

            select count from t1 where a in (select a from t1);

            t1 has two records:
            R1. record with a=1
            R2. record with a=2

            for record R1, "a in (select a from t1)" can be satisfied.
            for record R2, "a in (select a from t1)" can be satisfied.

            hence, "select count ... " may produce "2". It may also produce "1", or "0" (If this seems odd to you, consider a query which has "LIMIT 1" but no ORDER BY or GROUP BY. It can return different results, and all of them will be valid)

            Show
            psergey Sergei Petrunia added a comment - The manual says http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html "The server is free to choose any value from each group". Let's remember that and look at the query and dataset from this bugreport: select a from t1 group by b a=1 is a valid result. a=2 is also a valid result. Then, lets consider a query: select count from t1 where a in (select a from t1); t1 has two records: R1. record with a=1 R2. record with a=2 for record R1, "a in (select a from t1)" can be satisfied. for record R2, "a in (select a from t1)" can be satisfied. hence, "select count ... " may produce "2". It may also produce "1", or "0" (If this seems odd to you, consider a query which has "LIMIT 1" but no ORDER BY or GROUP BY. It can return different results, and all of them will be valid)
            Hide
            psergey Sergei Petrunia added a comment -

            Maybe, you've meant to say that

            "The server is free to choose any value from each group".

            should be read as

            "The server is free to choose any value from each group. The choice must be made once and then remain fixed for the duration of the query"

            ?

            Show
            psergey Sergei Petrunia added a comment - Maybe, you've meant to say that "The server is free to choose any value from each group". should be read as "The server is free to choose any value from each group. The choice must be made once and then remain fixed for the duration of the query" ?
            Hide
            serg Sergei Golubchik added a comment - - edited

            In a sense, yes. This query

            select a from t1 group by b

            can return either 1 or 2.

            Thus,

            select a from t1 where a in (select a from t1 group by b)

            will be one of

            select a from t1 where a in (1)
            select a from t1 where a in (2)

            how many times the optimizer will reevaluate the subquery is its internal implementation detail. It can do that as many times as it wants, but without breaking the semantics of the query.

            Show
            serg Sergei Golubchik added a comment - - edited In a sense, yes. This query select a from t1 group by b can return either 1 or 2. Thus, select a from t1 where a in (select a from t1 group by b) will be one of select a from t1 where a in (1) select a from t1 where a in (2) how many times the optimizer will reevaluate the subquery is its internal implementation detail. It can do that as many times as it wants, but without breaking the semantics of the query.

              People

              • Assignee:
                Unassigned
                Reporter:
                serg Sergei Golubchik
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: