Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6080

Allowing storage engine to shortcut group by queries

    Details

    • Type: Task
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Fix Version/s: 10.1.8
    • Component/s: Optimizer, Plugins
    • Labels:
      None
    • Sprint:
      10.1.8-1, 10.1.8-3, 10.1.8-4

      Description

      This task is to allow storage engines that can execute GROUP BY
      queries efficiently to intercept a full query or sub query from
      MariaDB and deliver the result either to the client or to a temporary
      table for further processing.

      The interface is using a new 'group_by_handler' class. The new class
      is needed as the original query may contain multiple tables and the
      'result row' can contain fields from different tables.

      Overview

      During prepare, call the storage engine handlerton to ask if the storage engine can execute the group by query.
      If yes:

      • The handlerton returns a group_by_handler object.
      • Create a temporary table to store result rows.
      • Initialize the group_by_handler with the temporary table and other relevant objects
      • When doing 'optimize', don't optimize join order (not needed)

      When do_select() is called, if we have a group_by_handler object, the following is done, instead of the normal procedure of reading things rows by row and joining tables:

      • Initialize group_by_handler
      • While get_next_row(), returns false:
        • Depending on context, write temporary_table->record[0] to the temporary table or return it to the next level (normally the end user).
      • finish group_by_handler

      Note that the above loop can be executed many times, in case of
      prepared statements or sub queries.

      • When cleanup up SELECT_LEX, we will free the group_by_handler object.

      More details

      Assumptions when this interface is used:

      • The SELECT is a GROUP BY or summary query.
      • All tables used in SELECT comes from the same storage engine.

      Suggested interface

      New function for the handlerton class:

        group_by_handler*
        handerton::can_intercept_group_by(THD *thd, SELECT_LEX *,
      			            List<Item> &fields,
      				    TABLE_LIST *, ORDER *group_by,
                                          ORDER *order_by, Item *where,
                                          Item *having);
      

      This function should return a group_by_handler object if the storage engine can resolve the query itself.

      New group_by_handler class with the following data and virtual methods:

        TABLE *temporary_table;
        Item *having;
        ORDER *order_by;
      
        /*
          Store pointer to temporary table and objects modified to point to
          the temporary table.  This will happen during the prepare phase.
          Return 1 if the storage handler cannot handle the GROUP BY after all,
          in which case we fall back to normal query execution.
        */
        bool init(TABLE *temporary_table, Item *having, ORDER *order_by);
      
        /*
          Bit's of things the storage engine can do. Should be initialized on
          object creation.
        */
        #define GROUP_BY_ORDER_BY 1  /* Result data is sorted */
        uint flags;
      
        bool init_scan();
        /* Return next row result in temporary_table
        bool next_row();
        bool end_scan();
      

      If the group_by_handler can't do the sorting, MariaDB will do this. Note that we assume that the handler can filter out things not matching HAVING (by calling having->val_bool()).

      In the future we will look at doing a more abstract interface so that the storage engine doesn't have to understand the SELECT_LEX and other structures.

        Attachments

          Activity

            People

            • Assignee:
              serg Sergei Golubchik
              Reporter:
              monty Michael Widenius
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2 days
                2d
                Remaining:
                Remaining Estimate - 2 days
                2d
                Logged:
                Time Spent - Not Specified
                Not Specified