Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5448

Performance regression between 10.0.4 and 10.0.5 (~8%)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.0.6
    • Fix Version/s: 10.0.8
    • Component/s: None
    • Labels:
      None

      Description

      As Axel mentioned in his E-mail, there is performance regression between 10.0.4 and 10.0.5:

      Date: Thu, 21 Nov 2013 18:32:45 +0100
      From: Axel Schwenke <axel@askmonty.org>
      To: "maria-developers@lists.launchpad.net" <maria-developers@lists.launchpad.net>
      Subject: [Maria-developers] MariaDB-10.0-beta sysbench results
      

      Looking for this regression I can see clear performance drop with the following revision:

      revno: 3427.1.258
      revision-id: knielsen@knielsen-hq.org-20130823120213-pbhsq4zc1h3jwa0i
      parent: knielsen@knielsen-hq.org-20130823081643-f3yhupp15yw9cpy4
      committer: knielsen@knielsen-hq.org
      branch nick: work-10.0-mdev26
      timestamp: Fri 2013-08-23 14:02:13 +0200
      message:
        MDEV-26: Global transaction ID.
      
        Implement @@gtid_binlog_state. This is the internal state of the binlog
        (most recent GTID logged for every domain_id and server_id). This allows
        to save the state before RESET MASTER and restore it afterwards.
      

      Specifically sys_vars.cc part:

      static unsigned char opt_gtid_binlog_state_dummy;
      static Sys_var_gtid_binlog_state Sys_gtid_binlog_state(
             "gtid_binlog_state",
             "The internal GTID state of the binlog, used to keep track of all "
             "GTIDs ever logged to the binlog.",
             GLOBAL_VAR(opt_gtid_binlog_state_dummy), NO_CMD_LINE);
      

      If I comment it out, I get nice performance boost. Note that it doesn't seem to have anything to do with gtid functionality accessed by Sys_var_gtid_binlog_state methods: I removed all references to gtid code and still observe performance degradation.

      It seem to be somehow caused by increase of system variables. If I add new system variable (on revision 3816), I can see performance degradation:

      static ulong table_cache_instances1;
      static Sys_var_ulong Sys_table_cache_instances1(
             "table_open_cache_instances1",
             "MySQL 5.6 compatible option. Not used or needed in MariaDB",
             READ_ONLY GLOBAL_VAR(table_cache_instances1), CMD_LINE(REQUIRED_ARG),
             VALID_RANGE(1, 64), DEFAULT(1),
             BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
             ON_UPDATE(NULL), NULL);
      

      The difference is like:
      64 threads, time spent: 60s, queries executed: 9326530, qps: 155442, 1 thread qps: 2428

      vs

      64 threads, time spent: 60s, queries executed: 9879031, qps: 164650, 1 thread qps: 2572

      I was unable to reproduce performance boost with fresh 10.0 by commenting out gtid_binlog_state.

      Even simpler patch for revision 3816 to see performance degradation:

      === modified file 'sql/sys_vars.cc'
      --- sql/sys_vars.cc	2013-08-14 08:48:50 +0000
      +++ sql/sys_vars.cc	2013-12-14 18:24:15 +0000
      @@ -2694,6 +2694,8 @@
              BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
              ON_UPDATE(NULL), NULL);
      
      +char buf[sizeof(Sys_table_cache_instances)];
      +
       static Sys_var_ulong Sys_thread_cache_size(
              "thread_cache_size",
              "How many threads we should keep in a cache for reuse",
      
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              svoj Sergey Vojtovich added a comment -

              When we add new system variable (e.g. ptr= 0x1061d40, size= 208), addresses of other global C++ variables may change. Among other things address of LOCK_open and unused_tables changes.

              rev.3816 (fast):
              LOCK_open: 0x1074120, size= 48 (cache line starts 0x1074100)
              unused_tables: 0x1074150, size= 8 (cache line starts 0x1074140)

              rev.3816 + "char buf[sizeof(Sys_table_cache_instances)]" (slow):
              LOCK_open: 0x1074200, size= 48 (cache line starts 0x1074200)
              unused_tables: 0x1074230, size= 8 (cache line starts 0x1074200)

              Note that in fast version LOCK_open resides on 2 cache lines (32 bytes on first + 16 bytes on second). Second cache line is shared with unused_tables. But since these last 16 bytes are quite static, there should be no false sharing issues.

              In slow version LOCK_open resides on 1 cache line which is shared with unused_tables.

              oprofile proves that LLC_MISSES increase in slow version:
              3816 (fast)
              CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
              Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
              samples % image name symbol name
              43387 37.4148 no-vmlinux /no-vmlinux
              21919 18.9019 libpthread-2.15.so pthread_mutex_lock
              6986 6.0244 libpthread-2.15.so pthread_mutex_unlock
              5427 4.6800 mysqld tc_release_table(TABLE*)
              3741 3.2261 mysqld TABLE::init(THD*, TABLE_LIST*)
              3168 2.7319 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
              3014 2.5991 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
              2199 1.8963 libpthread-2.15.so pthread_rwlock_unlock
              2151 1.8549 libpthread-2.15.so __lll_lock_wait
              2134 1.8403 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)

              3816 (slow)
              CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
              Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
              samples % image name symbol name
              43059 39.1488 no-vmlinux /no-vmlinux
              20065 18.2429 libpthread-2.15.so pthread_mutex_lock
              5736 5.2151 mysqld tc_release_table(TABLE*)
              5633 5.1215 libpthread-2.15.so pthread_mutex_unlock
              3331 3.0285 mysqld TABLE::init(THD*, TABLE_LIST*)
              2913 2.6485 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
              2666 2.4239 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
              2198 1.9984 libpthread-2.15.so pthread_rwlock_unlock
              1998 1.8166 libpthread-2.15.so __lll_lock_wait
              1976 1.7966 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)

              3816 (slow + padding)
              CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
              Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
              samples % image name symbol name
              43144 37.7159 no-vmlinux /no-vmlinux
              21324 18.6412 libpthread-2.15.so pthread_mutex_lock
              5930 5.1839 libpthread-2.15.so pthread_mutex_unlock
              5889 5.1481 mysqld tc_release_table(TABLE*)
              3678 3.2153 mysqld TABLE::init(THD*, TABLE_LIST*)
              3469 3.0326 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
              3221 2.8158 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
              2418 2.1138 libpthread-2.15.so pthread_rwlock_unlock
              2165 1.8926 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
              2144 1.8743 libpthread-2.15.so __lll_lock_wait

              Adding dummy padding around LOCK_open restore performance:
              +char pada[1024];
              mysql_mutex_t LOCK_open;
              +char padb[1024];

              Show
              svoj Sergey Vojtovich added a comment - When we add new system variable (e.g. ptr= 0x1061d40, size= 208), addresses of other global C++ variables may change. Among other things address of LOCK_open and unused_tables changes. rev.3816 (fast): LOCK_open: 0x1074120, size= 48 (cache line starts 0x1074100) unused_tables: 0x1074150, size= 8 (cache line starts 0x1074140) rev.3816 + "char buf [sizeof(Sys_table_cache_instances)] " (slow): LOCK_open: 0x1074200, size= 48 (cache line starts 0x1074200) unused_tables: 0x1074230, size= 8 (cache line starts 0x1074200) Note that in fast version LOCK_open resides on 2 cache lines (32 bytes on first + 16 bytes on second). Second cache line is shared with unused_tables. But since these last 16 bytes are quite static, there should be no false sharing issues. In slow version LOCK_open resides on 1 cache line which is shared with unused_tables. oprofile proves that LLC_MISSES increase in slow version: 3816 (fast) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43387 37.4148 no-vmlinux /no-vmlinux 21919 18.9019 libpthread-2.15.so pthread_mutex_lock 6986 6.0244 libpthread-2.15.so pthread_mutex_unlock 5427 4.6800 mysqld tc_release_table(TABLE*) 3741 3.2261 mysqld TABLE::init(THD*, TABLE_LIST*) 3168 2.7319 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3014 2.5991 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2199 1.8963 libpthread-2.15.so pthread_rwlock_unlock 2151 1.8549 libpthread-2.15.so __lll_lock_wait 2134 1.8403 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 3816 (slow) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43059 39.1488 no-vmlinux /no-vmlinux 20065 18.2429 libpthread-2.15.so pthread_mutex_lock 5736 5.2151 mysqld tc_release_table(TABLE*) 5633 5.1215 libpthread-2.15.so pthread_mutex_unlock 3331 3.0285 mysqld TABLE::init(THD*, TABLE_LIST*) 2913 2.6485 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2666 2.4239 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 2198 1.9984 libpthread-2.15.so pthread_rwlock_unlock 1998 1.8166 libpthread-2.15.so __lll_lock_wait 1976 1.7966 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 3816 (slow + padding) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43144 37.7159 no-vmlinux /no-vmlinux 21324 18.6412 libpthread-2.15.so pthread_mutex_lock 5930 5.1839 libpthread-2.15.so pthread_mutex_unlock 5889 5.1481 mysqld tc_release_table(TABLE*) 3678 3.2153 mysqld TABLE::init(THD*, TABLE_LIST*) 3469 3.0326 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3221 2.8158 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2418 2.1138 libpthread-2.15.so pthread_rwlock_unlock 2165 1.8926 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 2144 1.8743 libpthread-2.15.so __lll_lock_wait Adding dummy padding around LOCK_open restore performance: +char pada [1024] ; mysql_mutex_t LOCK_open; +char padb [1024] ;
              Hide
              svoj Sergey Vojtovich added a comment -

              MDEV-5388 removes unused_tables, so this particular performance regression is fixed.

              Show
              svoj Sergey Vojtovich added a comment - MDEV-5388 removes unused_tables, so this particular performance regression is fixed.

                People

                • Assignee:
                  svoj Sergey Vojtovich
                  Reporter:
                  svoj Sergey Vojtovich
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: