Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7084

innodb index stats inadequate using constant innodb_stats_sample_pages

    Details

      Description

      The rows_per_key for idx_obs_daily_1 and idx_obs_daily_2 are nowhere near the same. The rows_per_key for the first element on ibx_obs_daily_2 on the other 5 database slave servers of identical content was been ~2K and ~8K.

      Its highly suspicious that exactly the same number got calculated here for rows_per_field when the first element of the each of these indexes is a different field.

      ANALYZE table was run on all tables 2 weeks ago when the first instance of an incorrect query plan was observed. Though data changes this index stats also got corrupted again.

      MariaDB [weather]> select * from information_schema.INNODB_INDEX_STATS where table_schema='weather' and table_name='obs_daily';
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | table_schema | table_name | index_name      | fields | rows_per_key     | index_total_pages | index_leaf_pages |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | weather      | obs_daily  | PRIMARY         |      1 | 1                |            204608 |           178951 |
      | weather      | obs_daily  | idx_obs_daily_1 |      3 | 2708708, 1670, 1 |             66944 |            58327 |
      | weather      | obs_daily  | idx_obs_daily_2 |      2 | 2708708, 0       |             27904 |            24298 |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      3 rows in set (0.00 sec)
      
      MariaDB [weather]> analyze table weather.obs_daily;
      +-------------------+---------+----------+----------+
      | Table             | Op      | Msg_type | Msg_text |
      +-------------------+---------+----------+----------+
      | weather.obs_daily | analyze | status   | OK       |
      +-------------------+---------+----------+----------+
      1 row in set (0.04 sec)
      
      MariaDB [weather]> select * from information_schema.INNODB_INDEX_STATS where table_schema='weather' and table_name='obs_daily';
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | table_schema | table_name | index_name      | fields | rows_per_key     | index_total_pages | index_leaf_pages |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      | weather      | obs_daily  | PRIMARY         |      1 | 1                |            206976 |           180899 |
      | weather      | obs_daily  | idx_obs_daily_1 |      3 | 2884335, 3533, 1 |             67328 |            58696 |
      | weather      | obs_daily  | idx_obs_daily_2 |      2 | 8472, 1          |             28032 |            24442 |
      +--------------+------------+-----------------+--------+------------------+-------------------+------------------+
      3 rows in set (0.00 sec)
      
      | obs_daily | CREATE TABLE `obs_daily` (
        `iId` int(11) NOT NULL AUTO_INCREMENT, 
        `sLocType` varchar(10) NOT NULL DEFAULT '',  
        `sLocCode` varchar(30) NOT NULL DEFAULT '',  
        `dtDate` date NOT NULL DEFAULT '0000-00-00',  
      ....
        PRIMARY KEY (`iId`),
        UNIQUE KEY `idx_obs_daily_1` (`sLocType`,`sLocCode`,`dtDate`),
        KEY `idx_obs_daily_2` (`dtDate`)
      ) ENGINE=InnoDB AUTO_INCREMENT=23562482 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC |
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              jplindst Jan Lindström added a comment -

              I think you are correct and because srv_stats_sample_pages can be > 10 we should have:

              if (srv_stats_sample_traditional) {
              		/* It makes no sense to test more pages than are contained
              		in the index, thus we lower the number if it is too high */
              		if (srv_stats_sample_pages > index->stat_index_size) {
              			if (index->stat_index_size > 0) {
              				n_sample_pages = index->stat_index_size;
              			} else {
              				n_sample_pages = 1;
              			}
              		} else {
              			n_sample_pages = srv_stats_sample_pages;
              		}
              	} else {
              		/* New logaritmic number of pages that are estimated. We
              		first pick minimun from srv_stats_sample_pages and number of
              		pages on index. Then we pick maximum from previous number of
              		pages and log2(number of index pages) * srv_stats_sample_pages. */
              		if (index->stat_index_size > 0) {
              			n_sample_pages = ut_min(index->stat_index_size, 
                                                                                 ut_max(ut_min(srv_stats_sample_pages, index->stat_index_size),
              				                                                log2(index->stat_index_size)*srv_stats_sample_pages));
              		} else {
              			n_sample_pages = 1;
              		}
              	}
              
              Show
              jplindst Jan Lindström added a comment - I think you are correct and because srv_stats_sample_pages can be > 10 we should have: if (srv_stats_sample_traditional) { /* It makes no sense to test more pages than are contained in the index, thus we lower the number if it is too high */ if (srv_stats_sample_pages > index->stat_index_size) { if (index->stat_index_size > 0) { n_sample_pages = index->stat_index_size; } else { n_sample_pages = 1; } } else { n_sample_pages = srv_stats_sample_pages; } } else { /* New logaritmic number of pages that are estimated. We first pick minimun from srv_stats_sample_pages and number of pages on index. Then we pick maximum from previous number of pages and log2(number of index pages) * srv_stats_sample_pages. */ if (index->stat_index_size > 0) { n_sample_pages = ut_min(index->stat_index_size, ut_max(ut_min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size)*srv_stats_sample_pages)); } else { n_sample_pages = 1; } }
              Hide
              danblack Daniel Black added a comment - - edited

              Maths makes some of these paths unreachable.

              So taking index size as I and sample as S and log(I)*S as L

              so the current expression is min(I, max( min(S,I), L)

              case 1: assume S < I

              min(I, max( min(S,I), L) becomes

              min(I , max( S, L))

              but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L.

              so we have:

              min(I , L)

              case 2: assume I < S

              min(I, max( min(S,I), L) becomes

              min(I, max( I, L))

              case 2a: L > I

              min(I, max( I, L))

              becomes

              min(I, L)

              becomes I

              case 2b: when L < I

              min(I, max( I, L))

              becomes

              min(I, I )

              becomes I

              so our expression is:

              n_pages = S < I ? min(I,L) : I

              hence the simple version:

              	} else {
              		/* New logarithmic number of pages that are estimated.
                              If we have only 0 or 1 index pages then we can only take 1 sample.
              
                              So taking index size as I and sample as S and log(I)*S as L
              
                              requirement 1) we want the out limit of the expression to not exceed I;
                              requirement 2) we want the ideal pages to be at least S;
                              so the current expression is min(I, max( min(S,I), L)
              
                              looking for simplifications:
              
                              case 1: assume S < I
                              min(I, max( min(S,I), L) -> min(I , max( S, L))
              
                              but since L=LOG2(I)*S and log2(I) >=1   L>S always so max(S,L) = L.
              
                              so we have: min(I , L)
              
                               case 2: assume I < S
                               min(I, max( min(S,I), L) -> min(I, max( I, L))
              
                               case 2a: L > I
                               min(I, max( I, L)) -> min(I, L) -> I
              
                              case 2b: when L < I
                              min(I, max( I, L))  ->  min(I, I ) -> I
                              
                              so taking all case2 paths is I, our expression is:
                              n_pages = S < I? min(I,L) : I
                              */
                              if (index->stat_index_size > 1) {
                                n_sample_pages = (srv_stats_sample_pages < index->stat_index_size) ? ut_min(index->stat_index_size, log2(index->stat_index_size)*srv_stats_sample_pages) : index->stat_index_size;
                              }
                             else 
                             {
                               n_sample_pages = 1;
                            }
              	}
              

              edited: to fix a transposition in the comments

              Show
              danblack Daniel Black added a comment - - edited Maths makes some of these paths unreachable. So taking index size as I and sample as S and log(I)*S as L so the current expression is min(I, max( min(S,I), L) case 1: assume S < I min(I, max( min(S,I), L) becomes min(I , max( S, L)) but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L. so we have: min(I , L) case 2: assume I < S min(I, max( min(S,I), L) becomes min(I, max( I, L)) case 2a: L > I min(I, max( I, L)) becomes min(I, L) becomes I case 2b: when L < I min(I, max( I, L)) becomes min(I, I ) becomes I so our expression is: n_pages = S < I ? min(I,L) : I hence the simple version: } else { /* New logarithmic number of pages that are estimated. If we have only 0 or 1 index pages then we can only take 1 sample. So taking index size as I and sample as S and log(I)*S as L requirement 1) we want the out limit of the expression to not exceed I; requirement 2) we want the ideal pages to be at least S; so the current expression is min(I, max( min(S,I), L) looking for simplifications: case 1: assume S < I min(I, max( min(S,I), L) -> min(I , max( S, L)) but since L=LOG2(I)*S and log2(I) >=1 L>S always so max(S,L) = L. so we have: min(I , L) case 2: assume I < S min(I, max( min(S,I), L) -> min(I, max( I, L)) case 2a: L > I min(I, max( I, L)) -> min(I, L) -> I case 2b: when L < I min(I, max( I, L)) -> min(I, I ) -> I so taking all case2 paths is I, our expression is: n_pages = S < I? min(I,L) : I */ if (index->stat_index_size > 1) { n_sample_pages = (srv_stats_sample_pages < index->stat_index_size) ? ut_min(index->stat_index_size, log2(index->stat_index_size)*srv_stats_sample_pages) : index->stat_index_size; } else { n_sample_pages = 1; } } edited: to fix a transposition in the comments
              Hide
              jplindst Jan Lindström added a comment -

              Thanks, I really should educate myself more on math.

              Show
              jplindst Jan Lindström added a comment - Thanks, I really should educate myself more on math.
              Hide
              danblack Daniel Black added a comment -

              You're welcome. If you hadn't come back with a longer form I woudn't have done the proper proof and the slight improvement on my original 6 hrs ago wouldn't have been realised.

              Show
              danblack Daniel Black added a comment - You're welcome. If you hadn't come back with a longer form I woudn't have done the proper proof and the slight improvement on my original 6 hrs ago wouldn't have been realised.
              Hide
              danblack Daniel Black added a comment -

              thanks Jan. I see you committed this r4377

              much appreciate the credit in the commit too.

              Show
              danblack Daniel Black added a comment - thanks Jan. I see you committed this r4377 much appreciate the credit in the commit too.

                People

                • Assignee:
                  jplindst Jan Lindström
                  Reporter:
                  danblack Daniel Black
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: