Details

      Description

      Domas initial request:

      Essentially we want to be able to use:

      SELECT * FROM ( SELECT ... WHERE x=123 LIMIT 1000000 ) WHERE y=456 LIMIT 1000;

      Mostly this is to allow paging over large datasets that don't have that great selectivity - so we want to be able to return within reasonable time, even if x=123 condition matches billions of rows, but
      y=456 matches very few.
      Currently the internal subselect would get materialized by MySQL and wouldn't be used as streaming dataset, which makes it prohibitive on OLTP machines.

      In certain cases having DISTINCT not materialize would also be extremely useful for us, but that would probably somewhat different task? (we want to optimize b=Y,c=Z fetches on (a,b,c) indexes).

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            ratzpo Rasmus Johansson added a comment - - edited

            By Petrunia:

            If inner query didn't have LIMIT:

            SELECT * FROM ( SELECT ... WHERE x=123) WHERE y=456 LIMIT 1000;

            then MariaDB 5.3 (and MySQL 5.6) is able to handle this case by merging the inner select into the outer.

            "LIMIT 1000000" will prevents the merging, and the subquery will be materialized (that is, really fully read and stored in a temporary table).

            Currently there is no support for "streaming" FROM-clause subqueries. It is generally hard, because MySQL's join execution code has this structure:

            When we run execute a JOIN of tables t1, t2, t3, in that order, the call stack will look like this:

            JOIN::exec
            sub_select(table t1)
            sub_select(table t2)
            sub_select(table t2)
            end_send()

            { send the (t1 x t2 x t3) record to output}

            changing this execution model to provide a JOIN::get_next_output_record() function is rather difficult. Kostja has made an attempt to do this for cursors and he didn't succeed.

            However, if we're looking at only handling the special case where the subquery has form of

            (SELECT ... FROM single_table WHERE ... LIMIT ...)

            that is

            • select from single base table,
            • no [outer] joins or join buffering,
            • no grouping

            then the task becomes much simpler.

            Still, the task may depend on a piece of code from MariaDB 5.3/MySQL 5.5. The piece of code is the patch changed FROM-clause subqueries from being materialized very early (even before the parent join is optimized) to being materialized right before it is read

            Show
            ratzpo Rasmus Johansson added a comment - - edited By Petrunia: If inner query didn't have LIMIT: SELECT * FROM ( SELECT ... WHERE x=123) WHERE y=456 LIMIT 1000; then MariaDB 5.3 (and MySQL 5.6) is able to handle this case by merging the inner select into the outer. "LIMIT 1000000" will prevents the merging, and the subquery will be materialized (that is, really fully read and stored in a temporary table). Currently there is no support for "streaming" FROM-clause subqueries. It is generally hard, because MySQL's join execution code has this structure: When we run execute a JOIN of tables t1, t2, t3, in that order, the call stack will look like this: JOIN::exec sub_select(table t1) sub_select(table t2) sub_select(table t2) end_send() { send the (t1 x t2 x t3) record to output} changing this execution model to provide a JOIN::get_next_output_record() function is rather difficult. Kostja has made an attempt to do this for cursors and he didn't succeed. However, if we're looking at only handling the special case where the subquery has form of (SELECT ... FROM single_table WHERE ... LIMIT ...) that is select from single base table, no [outer] joins or join buffering, no grouping then the task becomes much simpler. Still, the task may depend on a piece of code from MariaDB 5.3/MySQL 5.5. The piece of code is the patch changed FROM-clause subqueries from being materialized very early (even before the parent join is optimized) to being materialized right before it is read
            Hide
            ratzpo Rasmus Johansson added a comment -

            Regarding DISTINCT by Petrunia:

            • I think we should get to this question after having resolved difficulties
              mentioned earlier in this email
            • It is generally possible to have DISTINCT run in an on-the-fly mode but then
              there will be a question of when the optimizer should choose to do that.
            Show
            ratzpo Rasmus Johansson added a comment - Regarding DISTINCT by Petrunia: I think we should get to this question after having resolved difficulties mentioned earlier in this email It is generally possible to have DISTINCT run in an on-the-fly mode but then there will be a question of when the optimizer should choose to do that.
            Hide
            ratzpo Rasmus Johansson added a comment -

            By Domas:

            Nested-loop-only support would be nice, OTOH, grouping, when done via a loose scan, could be extremely useful too.

            Show
            ratzpo Rasmus Johansson added a comment - By Domas: Nested-loop-only support would be nice, OTOH, grouping, when done via a loose scan, could be extremely useful too.

              People

              • Assignee:
                psergey Sergei Petrunia
                Reporter:
                ratzpo Rasmus Johansson
              • Votes:
                2 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: