Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.17
    • Fix Version/s: N/A
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      rhel5 x86_64

      Description

      Aborted a slave thread (accidentally) and the server stayed in a unresponse state with regard to shutdown or replication commands.

      [root@slave01]#  mysql
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 61015
      Server version: 10.0.17-MariaDB-log MariaDB Server
      
      Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
      
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
      
      MariaDB [(none)]> stop slave; set global #skip-slave-start
      ;Ctrl-C -- query killed. Continuing normally.
      ^[[A
      
      
      
      
      
      \c
      Ctrl-C -- query killed. Continuing normally.
      ERROR 2013 (HY000): Lost connection to MySQL server during query
          -> Ctrl-C -- exit!
      Aborted
      [root@slave01]# 
      
      [root@slave01]#  mysql
      MariaDB [(none)]> show slave status\G                     
      
      
      
      
      Ctrl-C -- query killed. Continuing normally.
      Ctrl-C -- query killed. Continuing normally.
      ERROR 2013 (HY000): Lost connection to MySQL server during query
      
      [root@slave01]# ps -ef
      ...
      mysql    10731 10380 61 12:28 pts/8    01:01:05 /usr/sbin/mysqld --basedir=/usr --datadir=/u01/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/
      ...
      [root@slave01]#  more /var/lib/mysql/mysqld.log
      
      ....
      150305 13:40:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:45:44 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:50:45 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 13:53:10 [ERROR] Slave SQL: Error 'Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist' on query. Default database: 'drupal_prod'. Query: 'INSERT INT
      O `drupal_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('50'))', Gtid 0-8-822547760, Internal MariaDB error code: 1146
      150305 13:53:10 [Warning] Slave: Table 'drupal_prod.__maatkit_char_chunking_map' doesn't exist Error_code: 1146
      150305 13:53:10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.027842' position 32786
      153
      150305 13:53:10 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.027842' at position 32786954
      150305 13:55:46 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      150305 14:00:48 [Warning] Access denied for user 'cactiuser'@'10.5.0.66' (using password: YES)
      ....
      
      [root@slave01]#  mysql
      
      MariaDB [(none)]> select 1;
      +---+
      | 1 |
      +---+
      | 1 |
      +---+
      1 row in set (0.00 sec)
      
      MariaDB [(none)]> show processlist;
      +-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
      | Id    | User                | Host               | db                       | Command | Time | State                                                                          | Info                           | Progress |
      +-------+---------------------+--------------------+--------------------------+---------+------+--------------------------------------------------------------------------------+--------------------------------+----------+
      |     3 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     4 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     5 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     6 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     7 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     8 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |     9 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    10 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    11 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    12 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    13 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    14 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    15 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    16 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    17 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    18 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    19 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    20 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    21 | system user         |                    | NULL                     | Connect |  952 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |    22 | system user         |                    | NULL                     | Connect |  953 | Waiting for prior transaction to start commit before starting next transaction | NULL                           |    0.000 |
      |   209 | replication         | 10.244.17.27:32946 | NULL                     | Query   |  143 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      |  2296 | system user         |                    | NULL                     | Connect | 5825 | Waiting for master to send event                                               | NULL                           |    0.000 |
      | 60758 | system user         |                    | NULL                     | Connect |  387 | Waiting for room in worker thread event queue                                  | NULL                           |    0.000 |
      | 60996 | root                | localhost          | NULL                     | Query   |  199 | Killing slave                                                                  | stop slave                     |    0.000 |
      | 61010 | cactiuser           | 10.5.0.66:34300    | NULL                     | Query   |  199 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
      | 61012 | mmm_monitor         | 10.244.17.7:45478  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61013 | mmm_monitor         | 10.244.17.7:45479  | NULL                     | Query   |  196 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61015 | root                | localhost          | NULL                     | Killed  |  175 | init                                                                           | stop slave                     |    0.000 |
      | 61017 | mmm_monitor         | 10.244.17.7:45558  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61018 | mmm_monitor         | 10.244.17.7:45559  | NULL                     | Query   |  188 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61019 | cactiuser           | 10.5.0.66:34308    | NULL                     | Query   |  184 | Filling schema table                                                           | SHOW /*!50002 GLOBAL */ STATUS |    0.000 |
      | 61021 | mmm_monitor         | 10.244.17.7:45625  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61022 | mmm_monitor         | 10.244.17.7:45626  | NULL                     | Query   |  180 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61027 | mmm_monitor         | 10.244.17.7:45773  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61028 | mmm_monitor         | 10.244.17.7:45774  | NULL                     | Query   |  172 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      | 61034 | mmm_monitor         | 10.244.17.7:45871  | NULL                     | Query   |  164 | init                                                                           | SHOW SLAVE STATUS              |    0.000 |
      

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment - - edited

              Hi,

              Could you please attach your cnf file(s) from the slave (and master if possible), or point at the JIRA issue where I can find them if you already provided them before in earlier bug reports?
              What version does the master run?
              Do you happen to have a stack trace from the stuck slave?
              Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

              Show
              elenst Elena Stepanova added a comment - - edited Hi, Could you please attach your cnf file(s) from the slave (and master if possible), or point at the JIRA issue where I can find them if you already provided them before in earlier bug reports? What version does the master run? Do you happen to have a stack trace from the stuck slave? Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?
              Hide
              danblack Daniel Black added a comment -

              > What version does the master run?

              10.0.15

              > Do you happen to have a stack trace from the stuck slave?

              No, the availability of a debug symbols package (MDEV-572 - although this is RHEL) would greatly assist in providing stack traces.

              > Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted?

              After. I had a stopped slave - exactly like MDEV-7668 (different slave - same stuck state)

              Show
              danblack Daniel Black added a comment - > What version does the master run? 10.0.15 > Do you happen to have a stack trace from the stuck slave? No, the availability of a debug symbols package ( MDEV-572 - although this is RHEL) would greatly assist in providing stack traces. > Do you know at which point of time, relatively to the error log, you issued stop slave – was it before or after the SQL thread aborted? After. I had a stopped slave - exactly like MDEV-7668 (different slave - same stuck state)
              Hide
              elenst Elena Stepanova added a comment -

              Kristian Nielsen,

              Is it one of the possible replication problems mentioned in MDEV-7668, or is it a totally different issue?

              Show
              elenst Elena Stepanova added a comment - Kristian Nielsen , Is it one of the possible replication problems mentioned in MDEV-7668 , or is it a totally different issue?
              Hide
              knielsen Kristian Nielsen added a comment -

              > Is it one of the possible replication problems mentioned in MDEV-7668, or is
              > it a totally different issue?

              It's almost certainly a different issue.

              It looks like insufficient error handling in STOP SLAVE. A kill (CTRL-C
              internally does KILL, I believe) manifests itself as an error return from
              whatever function detected the signal. If that error return is not handled
              correctly, the server might get stuck in some odd state...

              So one would need to check all error paths in STOP SLAVE, I suppose (since the
              actual point where the kill was detected is probably not available?)

              Show
              knielsen Kristian Nielsen added a comment - > Is it one of the possible replication problems mentioned in MDEV-7668 , or is > it a totally different issue? It's almost certainly a different issue. It looks like insufficient error handling in STOP SLAVE. A kill (CTRL-C internally does KILL, I believe) manifests itself as an error return from whatever function detected the signal. If that error return is not handled correctly, the server might get stuck in some odd state... So one would need to check all error paths in STOP SLAVE, I suppose (since the actual point where the kill was detected is probably not available?)
              Hide
              elenst Elena Stepanova added a comment -

              Okay, thanks. I'll try to run some tests with focus on stopping/killing the slave, maybe i'll get lucky reproducing it.

              Show
              elenst Elena Stepanova added a comment - Okay, thanks. I'll try to run some tests with focus on stopping/killing the slave, maybe i'll get lucky reproducing it.
              Hide
              danblack Daniel Black added a comment -

              might be MDEV-7126

              Show
              danblack Daniel Black added a comment - might be MDEV-7126
              Hide
              elenst Elena Stepanova added a comment -

              and/or MDEV-8039?

              Show
              elenst Elena Stepanova added a comment - and/or MDEV-8039 ?

                People

                • Assignee:
                  elenst Elena Stepanova
                  Reporter:
                  danblack Daniel Black
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated: