Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8440

Left Outer Join creates Cartesian join

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: 10.0.16
    • Fix Version/s: N/A
    • Component/s: Parser
    • Labels:
    • Environment:
      Linux Redhat 2.6.32-504.12.2.el6.x86_64 IBM Server 96GB RAM

      Description

      I created an SQL using a Left Join with resulted in a cartesian join as a result.
      The format is

      select 
      a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 
      from tablea a 
      left outer join tableb b on b.col4 and b.col1 
      where b.col5 < current_date()
      

      The erroneously placed and (should be an = ) creates a cartesian join. Some other database technology calls a parser error on this.
      In the case of the system, it happened in on column in the join condition is a primary key and the other is the lead column of a multi-column index of a primary key

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Peter McLarty,

            So, what would be the expected result from your point of view, a syntax error?
            Because if we accept the syntax, the cartesian product seems to be the only reasonable result set, do you agree?

            Regarding the syntax, I suppose that technically the complaint is valid. I'm not very fluent at reading the SQL standard, but if I understand the description of search conditions correctly, col1 AND col2 is not one, because colX is not a predicate.

            However, it has been accepted in MySQL for very long time, so disabling it now would be a disaster for existing applications.

            Do you suggest to make the change anyway?

            Show
            elenst Elena Stepanova added a comment - Peter McLarty , So, what would be the expected result from your point of view, a syntax error? Because if we accept the syntax, the cartesian product seems to be the only reasonable result set, do you agree? Regarding the syntax, I suppose that technically the complaint is valid. I'm not very fluent at reading the SQL standard , but if I understand the description of search conditions correctly, col1 AND col2 is not one, because colX is not a predicate . However, it has been accepted in MySQL for very long time, so disabling it now would be a disaster for existing applications. Do you suggest to make the change anyway?
            Hide
            Meerkat63 Peter McLarty added a comment -

            I would think that in this case changing this to block the cartesian join is preferable.

            The below is what SQL Server responds with and I sort of get that in this case. I am not saying SQL Server is right, just an alternate point of view

            An expression of non-boolean type specified in a context where a condition is expected, near 'and'.

            I hope that no one actually uses the erroneous left join syntax. Maybe I am not understanding some SQL standards information either but clearly a Left Join states that all from the left table (table 1) and the matching rows on the right table(table 2) to me doesn't leave much scope to turn this into a cartesian join due to this method of handling a boolean situation.

            Show
            Meerkat63 Peter McLarty added a comment - I would think that in this case changing this to block the cartesian join is preferable. The below is what SQL Server responds with and I sort of get that in this case. I am not saying SQL Server is right, just an alternate point of view An expression of non-boolean type specified in a context where a condition is expected, near 'and'. I hope that no one actually uses the erroneous left join syntax. Maybe I am not understanding some SQL standards information either but clearly a Left Join states that all from the left table (table 1) and the matching rows on the right table(table 2) to me doesn't leave much scope to turn this into a cartesian join due to this method of handling a boolean situation.
            Hide
            elenst Elena Stepanova added a comment - - edited

            FWIW, while the corner case in the description is indeed questionable, there are perfectly legal ways to get a cartesian product from the left join, fully compliant with the SQL standard, e.g. (assuming col4 is not nullable)

            select 
            a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 
            from tablea a 
            left outer join tableb b on b.col4 is not null
            

            I don't expect SQL Server refuses it either, or does it?

            And yes, the experience shows, whatever weird syntax is available, a number of people use it.

            Show
            elenst Elena Stepanova added a comment - - edited FWIW, while the corner case in the description is indeed questionable, there are perfectly legal ways to get a cartesian product from the left join, fully compliant with the SQL standard, e.g. (assuming col4 is not nullable) select a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 from tablea a left outer join tableb b on b.col4 is not null I don't expect SQL Server refuses it either, or does it? And yes, the experience shows, whatever weird syntax is available, a number of people use it.
            Hide
            Meerkat63 Peter McLarty added a comment -

            I defer to those greater than me on making this change. Personally I think the syntax as I originally presented is out of
            Thanks for your feedback

            using AdventureWorks2014 database

            use AdventureWorks2014;
            
            select [Name],[ProductNumber], [ReviewerName], [ReviewDate], [Comments]
            from [Production].[Product]
             left join [Production].[ProductReview] pr on pr.[ProductID] is not null;
            
            

            does, in fact, get accepted by SQL Server and does create a cartesian join. Your point is taken. Is mine an edge case I am not such an expert. If teh community agrees my case is an acceptable form then it might be something to go into odd examples of things which can go wrong. I had someone mention to me about a MySQL assumption of the col1 is true and col2 is true and therefore the SQL validates, that is a stretch to me on interpretation but if true and deemed acceptable we live with it.

            Show
            Meerkat63 Peter McLarty added a comment - I defer to those greater than me on making this change. Personally I think the syntax as I originally presented is out of Thanks for your feedback using AdventureWorks2014 database use AdventureWorks2014; select [Name],[ProductNumber], [ReviewerName], [ReviewDate], [Comments] from [Production].[Product] left join [Production].[ProductReview] pr on pr.[ProductID] is not null; does, in fact, get accepted by SQL Server and does create a cartesian join. Your point is taken. Is mine an edge case I am not such an expert. If teh community agrees my case is an acceptable form then it might be something to go into odd examples of things which can go wrong. I had someone mention to me about a MySQL assumption of the col1 is true and col2 is true and therefore the SQL validates, that is a stretch to me on interpretation but if true and deemed acceptable we live with it.
            Hide
            elenst Elena Stepanova added a comment -

            Assigning to Sergei Golubchik for further consideration.

            Show
            elenst Elena Stepanova added a comment - Assigning to Sergei Golubchik for further consideration.
            Hide
            serg Sergei Golubchik added a comment -

            MariaDB works as expected, it seems. AND is a boolean operator. In MariaDB (and in MySQL and in Percona Server) it implicitly casts its argument to the boolean type. So, your query should be equivalent to

            select  a.col1, a.col2, b.col1, b.col1, b.col3, b.col4  from tablea a  left outer join tableb b on
               (b.col4<>0) and (b.col1<>0)
            where b.col5 < current_date()
            

            (for the sake of the example, I assumed here that col1 and col4 are integers).
            This can result in a cartesian join, indeed. But MariaDB (and MySQL before it) always did this kind of implicit casting, changing that behavior will break many thousands of applications.

            Show
            serg Sergei Golubchik added a comment - MariaDB works as expected, it seems. AND is a boolean operator. In MariaDB (and in MySQL and in Percona Server) it implicitly casts its argument to the boolean type. So, your query should be equivalent to select a.col1, a.col2, b.col1, b.col1, b.col3, b.col4 from tablea a left outer join tableb b on (b.col4<>0) and (b.col1<>0) where b.col5 < current_date() (for the sake of the example, I assumed here that col1 and col4 are integers). This can result in a cartesian join, indeed. But MariaDB (and MySQL before it) always did this kind of implicit casting, changing that behavior will break many thousands of applications.

              People

              • Assignee:
                serg Sergei Golubchik
                Reporter:
                Meerkat63 Peter McLarty
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: