Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7097

Sub-Queries and Outer Joins may return NULL instead of (existing) value over ODBC

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.12, 10.0.14, 10.1.1
    • Fix Version/s: 10.0.15
    • Labels:
    • Environment:
      Windows 2008 R2 Server (amd64), MariaDB 10.0.12 (Windows, amd64), Oracle ODBC 11.02.00.03 (with Oracle 09.02.0060 on another server) as well as MySQL ODBC 5.03.04.00 ANSI driver (with MySQL Server 5.5.40-0ubuntu on another server)

      Description

      On certain conditions, a sub-select or outer join may return NULL instead of the actually existing value.

      You can test that simply with a MySQL table, e.g. `demo.subselectdemo`:

      id INT value VARCHAR(255)
      1 First
      2 Second
      3 Third

      Ideally on another computer, create a DSN with the MySQL ODBC driver (the MariaDB ODBC driver didn't work for me at all) to that database instance. Then create a database on that computer with a CONNECT-Table, e.g.:

      CREATE TABLE `proxy` ENGINE=CONNECT TABLE_TYPE=ODBC BLOCK_SIZE=10 TABNAME='demo.subselectdemo' CONNECTION='DSN=<your DSN>;UID=<username>;PWD=<password>';
      

      Now try the following query:

      SELECT a.id, (SELECT b.value FROM proxy AS b WHERE a.id = b.id) FROM proxy AS a;
      

      It returns:

      a.id (SELECT b.value FROM proxy AS b WHERE a.id = b.id)
      1 First
      2 (NULL)
      3 (NULL)

      It should return something like the following, which you can verify on the origin DB with the "real" table:

      SELECT a.id, (SELECT b.value FROM demo.subselectdemo AS b WHERE a.id = b.id) FROM demo.subselectdemo AS a;
      
      a.id (SELECT b.value FROM demo.subselectdemo AS b WHERE a.id = b.id)
      1 First
      2 Second
      3 Third

      This might have to do with CONNECT's limitation to "forward cursors".

      But similar happens to me with at least Oracle ODBC with a "LEFT OUTER JOIN" on two different, large (>5000 entries) tables, where the "key" used to correlate the entries is pretty random. After a few thousand entries, the OUTER JOIN is always NULL, although there is a corresponding entry in the other table - often it was actually found already for an earlier entry, e.g.:

      SELECT a.id, a.ref, b.ref FROM a LEFT OUTER JOIN b ON a.ref=b.ref;
      
      a.index a.ref b.ref
      1 45 45
      2 5 5
      3 526 526
      ...
      3503 6 6
      3504 45 (NULL)
      3505 8827 (NULL)
      3506 5 (NULL)
      ...

      A second OUTER JOIN in the same query stops even earlier, e.g.:

      SELECT a.id, a.ref, b.ref, b.otherref, c.otherref FROM a LEFT OUTER JOIN b ON a.ref=b.ref LEFT OUTER JOIN c ON b.otherref=c.otherref;
      
      a.index a.ref b.ref b.otherref c.otherref
      1 45 45 1 1
      2 5 5 7 7
      3 526 526 3 3
      ...
      1478 5 5 3 3
      1479 526 526 4 (NULL)
      1480 526 526 7 (NULL)
      ...
      3503 6 6 4 (NULL)
      3504 45 (NULL) (NULL) (NULL)
      3505 8827 (NULL) (NULL) (NULL)
      3506 5 (NULL) (NULL) (NULL)
      ...

      I hope you can solve both issues by fixing the "sub-select" issue; if this ticket needs to be splitted, I can try to create a proper test case for the JOIN-issue.

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment -

            Thanks for the report.

            I could reproduce it with e.g. postgresql for a remote table:

            connect=> \d+ t1
                                      Table "public.t1"
             Column |          Type          | Modifiers | Storage  | Description 
            --------+------------------------+-----------+----------+-------------
             id     | integer                |           | plain    | 
             value  | character varying(255) |           | extended | 
            Has OIDs: no
            
            connect=> select * from t1;
             id | value  
            ----+--------
              1 | First
              2 | Second
              3 | Third
            (3 rows)
            
            MariaDB [test]> show create table proxy\G
            *************************** 1. row ***************************
                   Table: proxy
            Create Table: CREATE TABLE `proxy` (
              `id` int(10) DEFAULT NULL,
              `value` varchar(255) DEFAULT NULL
            ) ENGINE=CONNECT DEFAULT CHARSET=latin1 CONNECTION='dsn=postgres' `TABLE_TYPE`='odbc' `TABNAME`='t1' `BLOCK_SIZE`=10
            1 row in set (0.00 sec)
            
            MariaDB [test]> SELECT a.id, (SELECT b.value FROM proxy AS b WHERE a.id = b.id) FROM proxy AS a;
            +------+----------------------------------------------------+
            | id   | (SELECT b.value FROM proxy AS b WHERE a.id = b.id) |
            +------+----------------------------------------------------+
            |    1 | First                                              |
            |    2 | NULL                                               |
            |    3 | NULL                                               |
            +------+----------------------------------------------------+
            3 rows in set (0.07 sec)
            
            Show
            elenst Elena Stepanova added a comment - Thanks for the report. I could reproduce it with e.g. postgresql for a remote table: connect=> \d+ t1 Table "public.t1" Column | Type | Modifiers | Storage | Description --------+------------------------+-----------+----------+------------- id | integer | | plain | value | character varying(255) | | extended | Has OIDs: no connect=> select * from t1; id | value ----+-------- 1 | First 2 | Second 3 | Third (3 rows) MariaDB [test]> show create table proxy\G *************************** 1. row *************************** Table: proxy Create Table: CREATE TABLE `proxy` ( `id` int(10) DEFAULT NULL, `value` varchar(255) DEFAULT NULL ) ENGINE=CONNECT DEFAULT CHARSET=latin1 CONNECTION='dsn=postgres' `TABLE_TYPE`='odbc' `TABNAME`='t1' `BLOCK_SIZE`=10 1 row in set (0.00 sec) MariaDB [test]> SELECT a.id, ( SELECT b.value FROM proxy AS b WHERE a.id = b.id) FROM proxy AS a; +------+----------------------------------------------------+ | id | ( SELECT b.value FROM proxy AS b WHERE a.id = b.id) | +------+----------------------------------------------------+ | 1 | First | | 2 | NULL | | 3 | NULL | +------+----------------------------------------------------+ 3 rows in set (0.07 sec)
            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            This is a general problem concerning correlated subqueries.
            Unlike not correlated subqueries that are executed only once, correlated subqueries are executed many times.

            What CONNECT does, when a table is accessed several times is to execute a "rewind" allowing to restart reading it from the beginning.
            It is what ODBC calls a "requery" and currently this is not implemented for CONNECT ODBC tables although it is for practically all other types, for instance MYSQL.
            BTW it is easy to do for MYSQL because the result set is stored in memory enabling to scroll in it.

            Now, to deal with ODBC tables there are several solutions:

            1. Not allowing "requery" but raise an error with a proper error message. This is just changing a critical bug (returning wrong result) to a documented restriction.
            2. Implementing "requery" by discarding the current result set and re submitting the query (as MFC does)
            3. Using a scrollable cursor.
            4. Storing the result set in memory as MYSQL tables do.

            All solutions raise some problems:

            1. Does not really fix it.
            2. Executing the query many times can be very time consuming if its execution is long (implying group by or join)
            3. Scrolling cursors cannot be used with extended fetch. Bad for performance if the result set is large. This would be penalizing when the table is not used in a correlated subquery.
            4. Best for performance but can fail with large result sets because of exhausted memory.

            It would be nice to restrict these only when needed. However, I am not sure whether the storage engine can determine if the table is used in a correlated subquery.
            I don't know either if the outer join fails for the same reason.
            I shall experiment with these and propose a (temporary?) fix.

            Show
            bertrandop Olivier Bertrand added a comment - - edited This is a general problem concerning correlated subqueries. Unlike not correlated subqueries that are executed only once, correlated subqueries are executed many times. What CONNECT does, when a table is accessed several times is to execute a "rewind" allowing to restart reading it from the beginning. It is what ODBC calls a "requery" and currently this is not implemented for CONNECT ODBC tables although it is for practically all other types, for instance MYSQL. BTW it is easy to do for MYSQL because the result set is stored in memory enabling to scroll in it. Now, to deal with ODBC tables there are several solutions: Not allowing "requery" but raise an error with a proper error message. This is just changing a critical bug (returning wrong result) to a documented restriction. Implementing "requery" by discarding the current result set and re submitting the query (as MFC does) Using a scrollable cursor. Storing the result set in memory as MYSQL tables do. All solutions raise some problems: Does not really fix it. Executing the query many times can be very time consuming if its execution is long (implying group by or join) Scrolling cursors cannot be used with extended fetch. Bad for performance if the result set is large. This would be penalizing when the table is not used in a correlated subquery. Best for performance but can fail with large result sets because of exhausted memory. It would be nice to restrict these only when needed. However, I am not sure whether the storage engine can determine if the table is used in a correlated subquery. I don't know either if the outer join fails for the same reason. I shall experiment with these and propose a (temporary?) fix.
            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            This fix implements solution (2) by default. The reason why is that it does not impact using the table out of correlated subquery.

            However, correlated subqueries last very long on medium and big table, particularly when remote.

            For instance, on a table having 4545 rows the query:

            SELECT a.nom, (SELECT b.prenom FROM xdbemp AS b WHERE a.matricule = b.matricule) FROM xdbemp AS a;
            

            takes 11 minutes on my machine. Slightly better is to use solution (3).
            For this you must tell that the table to use a scrollable cursor:

            ALTER TABLE xdbemp OPTION_LIST='scrollable=1';
            

            Doing so the same query last 6 minutes 25 seconds. However the table will no more use extended fetch, which can impact all other usages.

            A solution is to define a table to use only in correlated subqueries:

            CREATE TABLE xdbemp2 LIKE xdbemp;
            ALTER TABLE xdbemp2 OPTION_LIST='scrollable=1';
            

            Of course, on big tables, those queries would last forever. However I did not implement solution (4) because it is very simple to make a local copy of the table to use in correlated subqueries, for instance:

            CREATE TABLE locemp ENGINE=CONNECT TABLE_TYPE=FIX
            AS SELECT * FROM xdbemp;
            

            (It could use any other engine) Now the query:

            SELECT a.nom, (SELECT b.prenom FROM locemp AS b WHERE a.matricule = b.matricule) FROM xdbemp AS a;
            

            takes... 0.30 seconds!

            I don' know whether this will also fix the outer join problem. As a matter of facts, I could not reproduce this bug, even before I implemented the fix.

            Show
            bertrandop Olivier Bertrand added a comment - - edited This fix implements solution (2) by default. The reason why is that it does not impact using the table out of correlated subquery. However, correlated subqueries last very long on medium and big table, particularly when remote. For instance, on a table having 4545 rows the query: SELECT a.nom, (SELECT b.prenom FROM xdbemp AS b WHERE a.matricule = b.matricule) FROM xdbemp AS a; takes 11 minutes on my machine. Slightly better is to use solution (3). For this you must tell that the table to use a scrollable cursor: ALTER TABLE xdbemp OPTION_LIST='scrollable=1'; Doing so the same query last 6 minutes 25 seconds. However the table will no more use extended fetch, which can impact all other usages. A solution is to define a table to use only in correlated subqueries: CREATE TABLE xdbemp2 LIKE xdbemp; ALTER TABLE xdbemp2 OPTION_LIST='scrollable=1'; Of course, on big tables, those queries would last forever. However I did not implement solution (4) because it is very simple to make a local copy of the table to use in correlated subqueries, for instance: CREATE TABLE locemp ENGINE=CONNECT TABLE_TYPE=FIX AS SELECT * FROM xdbemp; (It could use any other engine) Now the query: SELECT a.nom, (SELECT b.prenom FROM locemp AS b WHERE a.matricule = b.matricule) FROM xdbemp AS a; takes... 0.30 seconds! I don' know whether this will also fix the outer join problem. As a matter of facts, I could not reproduce this bug, even before I implemented the fix.
            Hide
            bertrandop Olivier Bertrand added a comment - - edited

            Finally I added the implementation of solution (4) The main reason for doing so is that it is not penalising when the table is not used in a correlated sub-query. Indeed, the memory result set is not constructed on the first read (as the result set size is not known) It is allocated and populated during the second read that is still done normally. The third and subsequent read directly get the result from the stored set. As you will see it is about 6 times faster than the default method.

            Using a release server (previously it was a debug one) the execution times and option setting are:

            1. N.A.
            2. 95.69 sec (default)
            3. 24.35 sec (scrollable=1)
            4. 15.67 sec (memory=1)

            Note: It is still not the default to avoid memory problems with big tables.

            Solution (3) should be reguarded as an alternative when getting memory problems. It might slow down the execution of all queries even not in correlated sub-queries (if using a scrollable cursor is slower than using a forward only cursor), and may not be supported by all drivers.

            About the incredible execution time obtained using a local copy of the remote table, it is probably due to a MariaDB optimisation that transform the query to something else, probably a join. This should prompt you to avoid using correlated subqueries by all means and look for possible alternative.

            Show
            bertrandop Olivier Bertrand added a comment - - edited Finally I added the implementation of solution (4) The main reason for doing so is that it is not penalising when the table is not used in a correlated sub-query. Indeed, the memory result set is not constructed on the first read (as the result set size is not known) It is allocated and populated during the second read that is still done normally. The third and subsequent read directly get the result from the stored set. As you will see it is about 6 times faster than the default method. Using a release server (previously it was a debug one) the execution times and option setting are: N.A. 95.69 sec (default) 24.35 sec (scrollable=1) 15.67 sec (memory=1) Note: It is still not the default to avoid memory problems with big tables. Solution (3) should be reguarded as an alternative when getting memory problems. It might slow down the execution of all queries even not in correlated sub-queries (if using a scrollable cursor is slower than using a forward only cursor), and may not be supported by all drivers. About the incredible execution time obtained using a local copy of the remote table, it is probably due to a MariaDB optimisation that transform the query to something else, probably a join. This should prompt you to avoid using correlated subqueries by all means and look for possible alternative.

              People

              • Assignee:
                bertrandop Olivier Bertrand
                Reporter:
                bschneider Björn Schneider
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 days
                  2d