Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7648

Extra data in ANALYZE FORMAT=JSON $stmt

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1
    • Fix Version/s: 10.1
    • Component/s: Optimizer
    • Labels:

      Description

      ANALYZE FORMAT=JSON $stmt can (should) print more data about execution.

      Requested things

      • execution time of various parts of the statement
      • Real IO done by accesses to different tables
      • etc etc

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              psergey Sergei Petrunia added a comment -

              Tracking table read times will require start/stop points.

              The choice of starting point is obvious:

              • In JOIN::save_explain(), remember the current counter values. This is where the tracking starts

              As for ending, there are options:

              • Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed.
              • Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place).
              • Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
              Show
              psergey Sergei Petrunia added a comment - Tracking table read times will require start/stop points. The choice of starting point is obvious: In JOIN::save_explain(), remember the current counter values. This is where the tracking starts As for ending, there are options: Do it somewhere in JOIN::cleanup(full=true). This allows to save the data from index_merge's extra handlers before they are removed. Do it when producing the ANALYZE output. This will work (tables are closed after ANALYZE is printed). This is not as intrusive (ANALYZE code is in one place). Install into PFS_table a callback, which the table will call when it is closed. This is useful for index_merge/DS-MRR's extra handlers, and also compatible with log_slow_query. However, it is not compatible with the current location of code that prints ANALYZE: tables are still open when ANALYZE output is produced. Moving the code that prints ANALYZE further is difficult, because 1. other parts of the query might be freed 2. it goes against the general structure of the code.
              Hide
              psergey Sergei Petrunia added a comment - - edited

              Hit an interesting property in P_S: http://bugs.mysql.com/bug.php?id=76100. It doesn't directly affect this task.

              Show
              psergey Sergei Petrunia added a comment - - edited Hit an interesting property in P_S: http://bugs.mysql.com/bug.php?id=76100 . It doesn't directly affect this task.
              Hide
              psergey Sergei Petrunia added a comment -

              Ok so we now can collect the time that is spent accessing the tables.
              The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this:

              void JOIN::exec_inner()
              {
                ...
                THD_STAGE_INFO(thd, stage_executing);
              

              It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because

              • stage_executing doesn't tell which select is being run.
              • there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner
                • when a child select changes current stage, nobody will care to set it back?
              • the stages are not timed by default.
              select * from performance_schema.setup_instruments where name like 'stage%';
              

              shows that stage tracking is not enabled by default.

              Show
              psergey Sergei Petrunia added a comment - Ok so we now can collect the time that is spent accessing the tables. The next step is "execution time of various parts of the statement". We want to track the time spent in various subqueries, i.e SELECTs. P_S and profiling system has calls like this: void JOIN::exec_inner() { ... THD_STAGE_INFO(thd, stage_executing); It's tempting to re-use those to collect the time spent in SELECTs... but we can't do that, because stage_executing doesn't tell which select is being run. there is no THD_STAGE_INFO() call at the end of JOIN::exec_inner when a child select changes current stage, nobody will care to set it back? the stages are not timed by default. select * from performance_schema.setup_instruments where name like 'stage%'; shows that stage tracking is not enabled by default.
              Hide
              psergey Sergei Petrunia added a comment - - edited

              Examples of output with r_time_ms, time spent to access the table:
              https://gist.github.com/spetrunia/ae497fb33e2b5ff2b2c6
              https://gist.github.com/spetrunia/aa4c22b748eb74906950

              ANALYZE in PostgreSQL:
              https://gist.github.com/spetrunia/cab742456f9b0ebb74cf

              PostgreSQL also support a kind of ANALYZE FORMAT=JSON. (their syntax is explain (analyze, format json) select):
              https://gist.github.com/spetrunia/e0e5b706f4691aa81006

              Show
              psergey Sergei Petrunia added a comment - - edited Examples of output with r_time_ms, time spent to access the table: https://gist.github.com/spetrunia/ae497fb33e2b5ff2b2c6 https://gist.github.com/spetrunia/aa4c22b748eb74906950 ANALYZE in PostgreSQL: https://gist.github.com/spetrunia/cab742456f9b0ebb74cf PostgreSQL also support a kind of ANALYZE FORMAT=JSON. (their syntax is explain (analyze, format json) select ): https://gist.github.com/spetrunia/e0e5b706f4691aa81006
              Hide
              psergey Sergei Petrunia added a comment -

              Switched from using P_S to using our own timing code.

              Show
              psergey Sergei Petrunia added a comment - Switched from using P_S to using our own timing code.

                People

                • Assignee:
                  psergey Sergei Petrunia
                  Reporter:
                  psergey Sergei Petrunia
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated: