Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 10.0.6
-
Fix Version/s: 10.0.8
-
Component/s: None
-
Labels:None
Description
As Axel mentioned in his E-mail, there is performance regression between 10.0.4 and 10.0.5:
Date: Thu, 21 Nov 2013 18:32:45 +0100 From: Axel Schwenke <axel@askmonty.org> To: "maria-developers@lists.launchpad.net" <maria-developers@lists.launchpad.net> Subject: [Maria-developers] MariaDB-10.0-beta sysbench results
Looking for this regression I can see clear performance drop with the following revision:
revno: 3427.1.258 revision-id: knielsen@knielsen-hq.org-20130823120213-pbhsq4zc1h3jwa0i parent: knielsen@knielsen-hq.org-20130823081643-f3yhupp15yw9cpy4 committer: knielsen@knielsen-hq.org branch nick: work-10.0-mdev26 timestamp: Fri 2013-08-23 14:02:13 +0200 message: MDEV-26: Global transaction ID. Implement @@gtid_binlog_state. This is the internal state of the binlog (most recent GTID logged for every domain_id and server_id). This allows to save the state before RESET MASTER and restore it afterwards.
Specifically sys_vars.cc part:
static unsigned char opt_gtid_binlog_state_dummy; static Sys_var_gtid_binlog_state Sys_gtid_binlog_state( "gtid_binlog_state", "The internal GTID state of the binlog, used to keep track of all " "GTIDs ever logged to the binlog.", GLOBAL_VAR(opt_gtid_binlog_state_dummy), NO_CMD_LINE);
If I comment it out, I get nice performance boost. Note that it doesn't seem to have anything to do with gtid functionality accessed by Sys_var_gtid_binlog_state methods: I removed all references to gtid code and still observe performance degradation.
It seem to be somehow caused by increase of system variables. If I add new system variable (on revision 3816), I can see performance degradation:
static ulong table_cache_instances1; static Sys_var_ulong Sys_table_cache_instances1( "table_open_cache_instances1", "MySQL 5.6 compatible option. Not used or needed in MariaDB", READ_ONLY GLOBAL_VAR(table_cache_instances1), CMD_LINE(REQUIRED_ARG), VALID_RANGE(1, 64), DEFAULT(1), BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL), ON_UPDATE(NULL), NULL);
The difference is like:
64 threads, time spent: 60s, queries executed: 9326530, qps: 155442, 1 thread qps: 2428
vs
64 threads, time spent: 60s, queries executed: 9879031, qps: 164650, 1 thread qps: 2572
I was unable to reproduce performance boost with fresh 10.0 by commenting out gtid_binlog_state.
Even simpler patch for revision 3816 to see performance degradation:
=== modified file 'sql/sys_vars.cc'
--- sql/sys_vars.cc 2013-08-14 08:48:50 +0000
+++ sql/sys_vars.cc 2013-12-14 18:24:15 +0000
@@ -2694,6 +2694,8 @@
BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
ON_UPDATE(NULL), NULL);
+char buf[sizeof(Sys_table_cache_instances)];
+
static Sys_var_ulong Sys_thread_cache_size(
"thread_cache_size",
"How many threads we should keep in a cache for reuse",
Gliffy Diagrams
Attachments
Issue Links
- is duplicated by
-
MDEV-5388 Reduce usage of LOCK_open: unused_tables
-
- Closed
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
When we add new system variable (e.g. ptr= 0x1061d40, size= 208), addresses of other global C++ variables may change. Among other things address of LOCK_open and unused_tables changes.
rev.3816 (fast):
LOCK_open: 0x1074120, size= 48 (cache line starts 0x1074100)
unused_tables: 0x1074150, size= 8 (cache line starts 0x1074140)
rev.3816 + "char buf[sizeof(Sys_table_cache_instances)]" (slow):
LOCK_open: 0x1074200, size= 48 (cache line starts 0x1074200)
unused_tables: 0x1074230, size= 8 (cache line starts 0x1074200)
Note that in fast version LOCK_open resides on 2 cache lines (32 bytes on first + 16 bytes on second). Second cache line is shared with unused_tables. But since these last 16 bytes are quite static, there should be no false sharing issues.
In slow version LOCK_open resides on 1 cache line which is shared with unused_tables.
oprofile proves that LLC_MISSES increase in slow version:
3816 (fast)
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
43387 37.4148 no-vmlinux /no-vmlinux
21919 18.9019 libpthread-2.15.so pthread_mutex_lock
6986 6.0244 libpthread-2.15.so pthread_mutex_unlock
5427 4.6800 mysqld tc_release_table(TABLE*)
3741 3.2261 mysqld TABLE::init(THD*, TABLE_LIST*)
3168 2.7319 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
3014 2.5991 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
2199 1.8963 libpthread-2.15.so pthread_rwlock_unlock
2151 1.8549 libpthread-2.15.so __lll_lock_wait
2134 1.8403 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
3816 (slow)
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
43059 39.1488 no-vmlinux /no-vmlinux
20065 18.2429 libpthread-2.15.so pthread_mutex_lock
5736 5.2151 mysqld tc_release_table(TABLE*)
5633 5.1215 libpthread-2.15.so pthread_mutex_unlock
3331 3.0285 mysqld TABLE::init(THD*, TABLE_LIST*)
2913 2.6485 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
2666 2.4239 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
2198 1.9984 libpthread-2.15.so pthread_rwlock_unlock
1998 1.8166 libpthread-2.15.so __lll_lock_wait
1976 1.7966 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
3816 (slow + padding)
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
43144 37.7159 no-vmlinux /no-vmlinux
21324 18.6412 libpthread-2.15.so pthread_mutex_lock
5930 5.1839 libpthread-2.15.so pthread_mutex_unlock
5889 5.1481 mysqld tc_release_table(TABLE*)
3678 3.2153 mysqld TABLE::init(THD*, TABLE_LIST*)
3469 3.0326 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
3221 2.8158 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
2418 2.1138 libpthread-2.15.so pthread_rwlock_unlock
2165 1.8926 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
2144 1.8743 libpthread-2.15.so __lll_lock_wait
Adding dummy padding around LOCK_open restore performance:
+char pada[1024];
mysql_mutex_t LOCK_open;
+char padb[1024];