Details

      Description

      Some ERP generate that type of queries

      explain 
      select count(0) AS `COUNT(*)` 
      from 
         E_relance
         left join E_action on ((E_action.id_demande = E_relance.id_demande)) 
      where 
        ((E_relance.id_demande = 88224) or (E_action.id_demande = 88224))
      \G
      
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: E_relance
               type: index
      possible_keys: fk_E_relance_E_demande1
                key: fk_E_relance_E_demande1
            key_len: 5
                ref: NULL
               rows: 205655
              Extra: Using index
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: E_action
               type: ref
      possible_keys: FI__4381
                key: FI__4381
            key_len: 4
                ref: siam2.E_relance.id_demande
               rows: 2
              Extra: Using where; Using index
      

      The full index scan on primary table is not necessary if const on joined table is not NULL.

      Rewriting the query is hard to be done in the application because it may happen that the application is doing a lookup for NULL on left joined table

      indeed query rewriting change the lookup to const or range in case the second condition is not null and propagated to the upper table

      MAD_WEB_DEV (madweb@localhost) [siam2]> 
      explain select count(0) AS `COUNT(*)` 
      from
        E_relance left join E_action on ((E_action.id_demande = E_relance.id_demande)) 
      where 
        ((E_relance.id_demande = 88224) or (E_relance.id_demande = 88224))\G
      
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: E_relance
               type: ref
      possible_keys: fk_E_relance_E_demande1
                key: fk_E_relance_E_demande1
            key_len: 5
                ref: const
               rows: 1
              Extra: Using index
      *************************** 2. row ***************************
                 id: 1
        select_type: SIMPLE
              table: E_action
               type: ref
      possible_keys: FI__4381
                key: FI__4381
            key_len: 4
                ref: siam2.E_relance.id_demande
               rows: 2
              Extra: Using where; Using index
      

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            psergey Sergei Petrunia added a comment -

            So, the second query has

            where 
              ((E_relance.id_demande = 88224) or (E_relance.id_demande = 88224))
            

            which allows the optimizer to read only rows with E_relance.id_demande=88224, i.e use ref access.

            The first query is:

               E_relance
               left join E_action on ((E_action.id_demande = E_relance.id_demande)) 
            where 
              ((E_relance.id_demande = 88224) or (E_action.id_demande = 88224))
            

            The question is, can we satisfy the first query by just looking at rows with E_relance.id_demande=88224 ?

            Show
            psergey Sergei Petrunia added a comment - So, the second query has where ((E_relance.id_demande = 88224) or (E_relance.id_demande = 88224)) which allows the optimizer to read only rows with E_relance.id_demande=88224, i.e use ref access. The first query is: E_relance left join E_action on ((E_action.id_demande = E_relance.id_demande)) where ((E_relance.id_demande = 88224) or (E_action.id_demande = 88224)) The question is, can we satisfy the first query by just looking at rows with E_relance.id_demande=88224 ?
            Hide
            psergey Sergei Petrunia added a comment -

            The answer is "YES", one can infer that by following this logic:

            Suppose we read a row from E_relance, such that (E_relance.id_demande=88224)= FALSE
            Then, there are two options
            1. E_action has no matches, E_action.id_demande IS NULL. WHERE condition will evaluate to false.

            2. E_action has a match. ON expression then guarantees that (E_action.id_demande = 88224)= FALSE, which means that the WHERE condition will evaluate to false.

            Now, the question is, how can one generalize this logic to a rule that could be
            used by the optimizer..

            Show
            psergey Sergei Petrunia added a comment - The answer is "YES", one can infer that by following this logic: Suppose we read a row from E_relance, such that (E_relance.id_demande=88224) = FALSE Then, there are two options 1. E_action has no matches, E_action.id_demande IS NULL . WHERE condition will evaluate to false. 2. E_action has a match. ON expression then guarantees that (E_action.id_demande = 88224) = FALSE, which means that the WHERE condition will evaluate to false. Now, the question is, how can one generalize this logic to a rule that could be used by the optimizer..
            Hide
            stephane@skysql.com VAROQUI Stephane added a comment -

            The condition can be push up if left join and const is not null and fully push up for inner join

            Show
            stephane@skysql.com VAROQUI Stephane added a comment - The condition can be push up if left join and const is not null and fully push up for inner join
            Hide
            psergey Sergei Petrunia added a comment -

            The above logic can only be applied in a limited number of cases:

            First, the OR must be a two-way OR. If the WHERE expression was

              ((E_relance.id_demande = 88224) or (E_action.id_demande = 88224) or unknown_cond)
            

            then it would be possible that

            * {{(E_relance.id_demande = 88224)}} = FALSE
            * {{(E_relance.id_demande = 88224)}} is either FALSE (there was a match) or NULL (no match)
            * but, unknown_cond= TRUE.
            

            we would still need to scan the whole table E_relance to check unknown_cond for all rows.

            What if the constants in the WHERE were not the same? Consider

            where
              ((E_relance.id_demande = 1) or (E_action.id_demande = 2))
            

            Suppose we read a row from E_relance, such that (E_relance.id_demande=1)= FALSE.
            Then, there are two options
            1. E_action has no matches, E_action.id_demande IS NULL. WHERE condition will evaluate to false.
            2.1 E_action has a match, E_action.id_demande=2=FALSE. WHERE condition will evaluate to false.
            2.2 E_action has a match, E_action.id_demande=2=TRUE. Here, the condition will evaluate to true.

            So, we'll need to scan rows with E_relance.id_demande IN (1,2).

            Show
            psergey Sergei Petrunia added a comment - The above logic can only be applied in a limited number of cases: First, the OR must be a two-way OR. If the WHERE expression was ((E_relance.id_demande = 88224) or (E_action.id_demande = 88224) or unknown_cond) then it would be possible that * {{(E_relance.id_demande = 88224)}} = FALSE * {{(E_relance.id_demande = 88224)}} is either FALSE (there was a match) or NULL (no match) * but, unknown_cond= TRUE. we would still need to scan the whole table E_relance to check unknown_cond for all rows. What if the constants in the WHERE were not the same? Consider where ((E_relance.id_demande = 1) or (E_action.id_demande = 2)) Suppose we read a row from E_relance, such that (E_relance.id_demande=1) = FALSE. Then, there are two options 1. E_action has no matches, E_action.id_demande IS NULL . WHERE condition will evaluate to false. 2.1 E_action has a match, E_action.id_demande=2 =FALSE. WHERE condition will evaluate to false. 2.2 E_action has a match, E_action.id_demande=2 =TRUE. Here, the condition will evaluate to true. So, we'll need to scan rows with E_relance.id_demande IN (1,2) .
            Hide
            psergey Sergei Petrunia added a comment -

            Attempt to generalize #1:

            Consider a query

            select * from t1 left join t2 on on_expr where where_expr.
            

            Suppose, we are doing range analysis on table t1. We are analyzing where_expr,
            at the moment we're looking at its sub-condition $COND.

            If $COND is null-rejecting wrt table t2, we can use multiple-equalities from
            on_expr to substitute in $COND columns of t2 for columns of t1.

            Need to think of:

            • whether the above is correct
            • how this applies to nested outer joins.
            Show
            psergey Sergei Petrunia added a comment - Attempt to generalize #1: Consider a query select * from t1 left join t2 on on_expr where where_expr. Suppose, we are doing range analysis on table t1. We are analyzing where_expr, at the moment we're looking at its sub-condition $COND. If $COND is null-rejecting wrt table t2, we can use multiple-equalities from on_expr to substitute in $COND columns of t2 for columns of t1. Need to think of: whether the above is correct how this applies to nested outer joins.
            Hide
            psergey Sergei Petrunia added a comment -

            Checked PostgreSQL:

            test=# explain select * from t1 left join t2 on t2.col1=t1.col1 where (t1.pk between 10 and 20) or t2.col1 between 30 and 40 ;
                                                   QUERY PLAN                                       
            ----------------------------------------------------------------------------------------
             Hash Left Join  (cost=2693.00..7136.00 rows=20 width=16)
               Hash Cond: (t1.col1 = t2.col1)
               Filter: (((t1.pk >= 10) AND (t1.pk <= 20)) OR ((t2.col1 >= 30) AND (t2.col1 <= 40)))
               ->  Seq Scan on t1  (cost=0.00..1443.00 rows=100000 width=8)
               ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)
                     ->  Seq Scan on t2  (cost=0.00..1443.00 rows=100000 width=8)
             Planning time: 0.465 ms
            (7 rows)
            

            Note the Seq Scan elements. They dont have the optimization that we're talking about here.

            Show
            psergey Sergei Petrunia added a comment - Checked PostgreSQL: test=# explain select * from t1 left join t2 on t2.col1=t1.col1 where (t1.pk between 10 and 20) or t2.col1 between 30 and 40 ; QUERY PLAN ---------------------------------------------------------------------------------------- Hash Left Join (cost=2693.00..7136.00 rows=20 width=16) Hash Cond: (t1.col1 = t2.col1) Filter: (((t1.pk >= 10) AND (t1.pk <= 20)) OR ((t2.col1 >= 30) AND (t2.col1 <= 40))) -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8) -> Hash (cost=1443.00..1443.00 rows=100000 width=8) -> Seq Scan on t2 (cost=0.00..1443.00 rows=100000 width=8) Planning time: 0.465 ms (7 rows) Note the Seq Scan elements. They dont have the optimization that we're talking about here.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                stephane@skysql.com VAROQUI Stephane
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: