Details

    • Type: Task
    • Status: Stalled
    • Priority: Minor
    • Resolution: Unresolved
    • Fix Version/s: 10.1
    • Component/s: None

      Description

      One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table (it used to 10 earlier, now it's 5 ).

      Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also - somewhat - on the application (optimizer model isn't perfect, and cost factors vary for different types of queries, thus it's better to calibrate on the queries that the user application will run).

      A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.

      Assorted thoughts:

      • create a tuning coefficient for every low-level cost generator in the optimizer. That is for every function or a handler method that generates a cost value. For every hard-coded constant or a macro too. But not for functions that create cost values from other cost values. For a virtual method - for every implementation of it, that is, one coefficient for every read_time in every handler class. But not for different parameters of it. For example, different tables in some engine might have different costs for reading a row, but we will still have one coefficient for read_time in this engine, not one for each table. The engine is suppose to provide different costs internally, if it needs to. The goal of these coefficients is to normalize the cost between different engines.
      • measure the time that the query took, split proportionally between tables (according to the number of rows), save the statistics per coefficient.
      • collect the statistics locally in the THD, add to the global statistics on disconnect. it helps to avoid contention on a shared resource
      • optimizer will use the global statistics, not thread local. It shouldn't matter, as coefficients will change very slowly.
      • in splitting the time use the actual number of rows, not the estimated one, that the optimizer used.
      • per coefficient store - counter (bigint), sum of times (double), sum of times squared (double).
      • store the results persistently in mysql database
      • make them available via the I_S table. In this table show two more columns - the average and the standard deviation.
      • report these data via the feedback plugin. we can adjust built-in constants or initial values of these coefficients. and very large deviation is a sign that an engine estimates the cost incorrectly (e.g. doesn't take the table name into account, see above)
      • a user can update the table manually, if she wishes so. she can even freeze the coefficients by setting the count column to the very large value.
      • system load anomalies may introduce undesired changes in the coefficients. is it a problem? should we implement some countermeasures?

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            ebergen Eric Bergen added a comment -

            I can see this going horribly wrong both when system load changes and due to hardware anomalies. For example when a hard drive hits a bad sector it may do an i/o which takes up to 8 seconds by default which may throw off all of your measurement for that query. If that query ends up updating the global statistics then all future queries may be dramatically punished all because of a single isolated problem.

            cpu overload can also impact timing issues. For example if you have a burst of threads running then certain parts of a query may take longer as they fight for locks or cpu time.

            It would be nice to run a sample workload to gather these statistics then be able to freeze them on an instance. This would take care of the cases where the optimizer makes assumptions about disk seek time that were true on hard drives but aren't true on flash hardware but without having to worry about hosts all the sudden wildly changing their query plans due to hardware blips.

            Show
            ebergen Eric Bergen added a comment - I can see this going horribly wrong both when system load changes and due to hardware anomalies. For example when a hard drive hits a bad sector it may do an i/o which takes up to 8 seconds by default which may throw off all of your measurement for that query. If that query ends up updating the global statistics then all future queries may be dramatically punished all because of a single isolated problem. cpu overload can also impact timing issues. For example if you have a burst of threads running then certain parts of a query may take longer as they fight for locks or cpu time. It would be nice to run a sample workload to gather these statistics then be able to freeze them on an instance. This would take care of the cases where the optimizer makes assumptions about disk seek time that were true on hard drives but aren't true on flash hardware but without having to worry about hosts all the sudden wildly changing their query plans due to hardware blips.
            Hide
            serg Sergei Golubchik added a comment -

            A specific value (say, "read_time") will be measured many times and the on-disk table will store the sum of all measurements and the number of them. The optimizer will use the average (sum/count). This means:

            • hopefully, short spikes won't disrupt the average too much
            • one can effectively "freeze" the average by updating the table and putting a huge "count" value.
            Show
            serg Sergei Golubchik added a comment - A specific value (say, "read_time") will be measured many times and the on-disk table will store the sum of all measurements and the number of them. The optimizer will use the average (sum/count). This means: hopefully, short spikes won't disrupt the average too much one can effectively "freeze" the average by updating the table and putting a huge "count" value.
            Hide
            rspadim roberto spadim added a comment -

            Eric, yes that's a problem, but... that's a nice solution to many problems
            The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =)
            about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]

            Show
            rspadim roberto spadim added a comment - Eric, yes that's a problem, but... that's a nice solution to many problems The statistic is the main part here, if the standard mean (or any other usable variable) become too distance from "current" measures we could alert DBA about disk problems or too much queries/second, think about a usefull feature to alert DBA and developers/engeneer about a possible problem =) about increasing robustness of the self-tunning cost coefficients algorithm we can develop with more time and use cases, for the first version, a useless (no changes to cost coefficients/optimizer) but measurable feature is ok, with time we get the experience to write a good (robust/inteligent) control =]
            Hide
            rspadim roberto spadim added a comment -

            Hi guys any news here?

            Show
            rspadim roberto spadim added a comment - Hi guys any news here?
            Hide
            stephane@skysql.com VAROQUI Stephane added a comment -

            I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark

            But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow

            • Data Cache
            • Index Cache
            • FS Cache
            • Controller Cache
            • CPU Caches

            Non determinism's is also driven by the workload itself, like :

            • writing reading from last time range ,
            • number of secondary indexes in a write,
            • storage engine
            • randomness data access pattern
            • lock time
            • column and row size fetching

            looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth

            Show
            stephane@skysql.com VAROQUI Stephane added a comment - I would also give +1 for a global statistic approach more that gathering fixe constants like in a benchmark But i would add that a per table set of constant would be better because in any workload some specifics tables may cause more often non determinist cache overflow Data Cache Index Cache FS Cache Controller Cache CPU Caches Non determinism's is also driven by the workload itself, like : writing reading from last time range , number of secondary indexes in a write, storage engine randomness data access pattern lock time column and row size fetching looks like such metrics are mostly per table dependent , and can help Query Plan to better estimate join cost at each depth

              People

              • Assignee:
                serg Sergei Golubchik
                Reporter:
                serg Sergei Golubchik
              • Votes:
                2 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: