Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8566

Dependend_subquery with in doesn't use (primary) index

    Details

    • Type: Bug
    • Status: Stalled
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.3.12, 10.1, 10.0, 5.5
    • Fix Version/s: N/A
    • Component/s: Optimizer
    • Labels:

      Description

      Having the following query causes the database to make a table scan:

      SELECT *,(SELECT GROUP_CONCAT(Name) FROM user WHERE ID IN (a,b,c)) FROM `test` WHERE a!=0
      

      Explain returns:

      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      |    1 | PRIMARY            | test  | ALL  | a             | NULL | NULL    | NULL | 1335 | Using where |
      |    2 | DEPENDENT SUBQUERY | user  | ALL  | NULL          | NULL | NULL    | NULL | 4545 | Using where |
      +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
      

      Table definitions:

      CREATE TABLE `test` (
        `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `a` int(10) unsigned NOT NULL,
        `b` int(10) unsigned NOT NULL,
        `c` int(10) unsigned NOT NULL,
        PRIMARY KEY (`ID`),
        KEY `a` (`a`)
      ) ENGINE=Aria
      
      CREATE TABLE `user` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `name` varchar(10) COLLATE latin1_german1_ci NOT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=Aria
      

      As you might guess the original expression is more complex. As you can see, the result of the subquery can only have 3 entries which is grouped. The Subquery is dependend so this has to be performed on each resulting row - correct. But why does the subquery don't use the primary index which would only use 3 result-entries, not 4545

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            > I thought the algorithm executes each subquery on the base of the result set, if the index/merge doesn't work.
            > If these subqueries get executed after the main query, it would result in less database questions, can use indices and lead to the same result.

            The subquery will be invoked multiple times during execution of the top-level select.

            Each execution will have use the same query plan. And the problem is that there is no query plan that is "dynamic" (depends on the top-level query) and fast to execute.

            Show
            psergey Sergei Petrunia added a comment - > I thought the algorithm executes each subquery on the base of the result set, if the index/merge doesn't work. > If these subqueries get executed after the main query, it would result in less database questions, can use indices and lead to the same result. The subquery will be invoked multiple times during execution of the top-level select. Each execution will have use the same query plan. And the problem is that there is no query plan that is "dynamic" (depends on the top-level query) and fast to execute.
            Hide
            psergey Sergei Petrunia added a comment -

            Do you think this is sth. which would be an improvement, or do you just think it should be left as is?

            I think fixing this would require a lot of effort, and is probably not worth it (unless this turns out to be a very common query pattern (but I believe it is not)).

            Thanks for taking time to report this, anyway.

            Show
            psergey Sergei Petrunia added a comment - Do you think this is sth. which would be an improvement, or do you just think it should be left as is? I think fixing this would require a lot of effort, and is probably not worth it (unless this turns out to be a very common query pattern (but I believe it is not)). Thanks for taking time to report this, anyway.
            Hide
            mokraemer Marc added a comment -

            I'm not familiar with the source code of mariadb, to argue against or for it.

            From application point of view, I try to get a most complete result which is displayed. Looping on the result on application level has overhead. But currently the loop with multiple queries will be faster than the query with a subquery in mariadb.

            I see, I thought each of the subqueries, if not merged, would be as if it is executed by the application - but since on the same memory, machine, ... faster than through network.

            Show
            mokraemer Marc added a comment - I'm not familiar with the source code of mariadb, to argue against or for it. From application point of view, I try to get a most complete result which is displayed. Looping on the result on application level has overhead. But currently the loop with multiple queries will be faster than the query with a subquery in mariadb. I see, I thought each of the subqueries, if not merged, would be as if it is executed by the application - but since on the same memory, machine, ... faster than through network.
            Hide
            psergey Sergei Petrunia added a comment -

            Yes. This is a reasonable assumption. However, certain limitations in the optimizer make it a bit difficult.

            I think I'll reopen and discuss with other optimizer devs.

            Show
            psergey Sergei Petrunia added a comment - Yes. This is a reasonable assumption. However, certain limitations in the optimizer make it a bit difficult. I think I'll reopen and discuss with other optimizer devs.
            Hide
            psergey Sergei Petrunia added a comment -

            For the record

            test=# explain select * ,(select sum(length(name)) from user1 where user1.ID  IN (test.a,test.b)) from test;
                                                  QUERY PLAN                                       
            ---------------------------------------------------------------------------------------
             Seq Scan on test  (cost=0.00..16920.99 rows=1000 width=16)
               SubPlan 1
                 ->  Aggregate  (cost=16.89..16.90 rows=1 width=6)
                       ->  Index Scan using user1_pkey on user1  (cost=0.42..16.88 rows=2 width=6)
                             Index Cond: (id = ANY (ARRAY[test.a, test.b]))
             Planning time: 0.342 ms
            
            Show
            psergey Sergei Petrunia added a comment - For the record test=# explain select * ,(select sum(length(name)) from user1 where user1.ID IN (test.a,test.b)) from test; QUERY PLAN --------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..16920.99 rows=1000 width=16) SubPlan 1 -> Aggregate (cost=16.89..16.90 rows=1 width=6) -> Index Scan using user1_pkey on user1 (cost=0.42..16.88 rows=2 width=6) Index Cond: (id = ANY (ARRAY[test.a, test.b])) Planning time: 0.342 ms

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                mokraemer Marc
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: