Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7903

xtrabackup SST failing with maria-10.0-galera

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.0.16-galera
    • Fix Version/s: 10.0.20-galera
    • Component/s: Galera SST
    • Labels:
      None
    • Environment:
      prod

      Description

      Performing an SST using xtrabackup-v2 methods
      fails with following errors:

      
      150330 11:09:37 [ERROR] I/O error reading the header from the binary log, errno=175, io cache code=0
      150330 11:09:37 [ERROR] I/O error reading the header from the binary log
      150330 11:09:37 [ERROR] Can't init tc log
      150330 11:09:37 [ERROR] Aborting
      

      It creates some corrupt binary log file on the joiner node

      root@con:~# ls -lh /data/mysql/binlog/
      total 4.0K
      -rw-rw---- 1 mysql mysql 32 Mar 30 11:09 binlog.index
      -rw-rw---- 1 mysql mysql  0 Mar 30 11:09 binlog.state
      

      This issue is repeatable, set log_bin to non-default location on all the nodes e.g. we have set it to

      log_bin=/data/mysql/binlog/binlog
      

      Work around was to change log_bin location to its default location on all the nodes i.e.

      log_bin=binlog
      

      SST using xtrabackup-v2 on a joiner node would be a success.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              nirbhay_c Nirbhay Choubey added a comment -

              A part of the fix depends on MDEV-7110 as PXB uses log_bin_basename to determine binlog directory.

              Show
              nirbhay_c Nirbhay Choubey added a comment - A part of the fix depends on MDEV-7110 as PXB uses log_bin_basename to determine binlog directory.
              Hide
              doliver3 David Oliver added a comment -

              The workaround only worked for me if I used the exact name and also commented out the log_bin_index like this. For each test I simply bootstrap the first node then start the second and if it didn't fail then I would restart the second again. It failed with the name mariadb-bin and whenever the log_bin_index was specified even when the name was provided without a path as just the filename like log_bin_index = mariadb-bin.index. WSREP_SST cleaning datadir failed in these cases where the log_bin_index name was set. It would be good to confirm that whatever fix is provided also works for the WSREP_SST cleaning step. I'm testing on Ubuntu 14.10 with latest MariaDB Galera packages.

              WORKAROUND THAT WORKED:
              in /etc/mysql/my.cnf
              log_bin=binlog
              #log_bin_index = /var/log/mysql/mariadb-bin.index

              FAILED SCENARIO 2:
              in /etc/mysql/conf.d/galera.conf
              log_bin = /var/log/mysql/mariadb-bin
              log_bin_index = /var/log/mysql/mariadb-bin.index

              worked initially then on restart of the joiner node, it showed the error already described in this issue

              FAILED SCENARIO 3:
              in /etc/mysql/conf.d/galera.conf
              log_bin = mariadb-bin
              log_bin_index = mariadb-bin.index

              150503 15:49:57 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
              150503 15:49:57 [Note] WSREP: Requesting state transfer: success, donor: 1
              WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( $

              {PIPESTATUS[@]} ) (20150503 15:49:57.377)
              WSREP_SST: [INFO] Proceeding with SST (20150503 15:49:57.879)
              WSREP_SST: [INFO] Cleaning the existing datadir (20150503 15:49:57.882)
              removed '/var/lib/mysql/gvwstate.dat'
              WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql
              . as well (20150503 15:49:57.903)
              find: paths must precede expression: ./mariadb-bin\.[0-9]+$
              Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
              WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 15:49:57.909)
              WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 15:49:57.913)
              150503 15:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '1533' --binlog 'mariadb-bin' : 1 (Operation not permitted)
              150503 15:49:57 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
              150503 15:49:57 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
              150503 15:49:57 [ERROR] Aborting


              FAILED SCENARIO 3:
              in /etc/mysql/conf.d/galera.conf
              log_bin = /var/lib/mysql/mariadb-bin
              log_bin_index = /var/lib/mysql/mariadb-bin.index

              WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]}

              ) (20150503 16:13:08.917)
              WSREP_SST: [INFO] Proceeding with SST (20150503 16:13:09.456)
              WSREP_SST: [INFO] Cleaning the existing datadir (20150503 16:13:09.462)
              removed '/var/lib/mysql/gvwstate.dat'
              WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql
              /var/lib/mysql as well (20150503 16:13:09.488)
              find: paths must precede expression: /var/lib/mysql/mariadb-bin\.[0-9]+$
              Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
              WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 16:13:09.494)
              WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 16:13:09.498)
              150503 16:13:09 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '17625' --binlog '/var/lib/mysql/mariadb-bin' : 1 (Operation not permitted)
              150503 16:13:09 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
              150503 16:13:09 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
              150503 16:13:09 [ERROR] Aborting

              Show
              doliver3 David Oliver added a comment - The workaround only worked for me if I used the exact name and also commented out the log_bin_index like this. For each test I simply bootstrap the first node then start the second and if it didn't fail then I would restart the second again. It failed with the name mariadb-bin and whenever the log_bin_index was specified even when the name was provided without a path as just the filename like log_bin_index = mariadb-bin.index. WSREP_SST cleaning datadir failed in these cases where the log_bin_index name was set. It would be good to confirm that whatever fix is provided also works for the WSREP_SST cleaning step. I'm testing on Ubuntu 14.10 with latest MariaDB Galera packages. WORKAROUND THAT WORKED: in /etc/mysql/my.cnf log_bin=binlog #log_bin_index = /var/log/mysql/mariadb-bin.index FAILED SCENARIO 2: in /etc/mysql/conf.d/galera.conf log_bin = /var/log/mysql/mariadb-bin log_bin_index = /var/log/mysql/mariadb-bin.index worked initially then on restart of the joiner node, it showed the error already described in this issue FAILED SCENARIO 3: in /etc/mysql/conf.d/galera.conf log_bin = mariadb-bin log_bin_index = mariadb-bin.index 150503 15:49:57 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0) 150503 15:49:57 [Note] WSREP: Requesting state transfer: success, donor: 1 WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( $ {PIPESTATUS[@]} ) (20150503 15:49:57.377) WSREP_SST: [INFO] Proceeding with SST (20150503 15:49:57.879) WSREP_SST: [INFO] Cleaning the existing datadir (20150503 15:49:57.882) removed '/var/lib/mysql/gvwstate.dat' WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql . as well (20150503 15:49:57.903) find: paths must precede expression: ./mariadb-bin\. [0-9] +$ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 15:49:57.909) WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 15:49:57.913) 150503 15:49:57 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '1533' --binlog 'mariadb-bin' : 1 (Operation not permitted) 150503 15:49:57 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 150503 15:49:57 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 150503 15:49:57 [ERROR] Aborting FAILED SCENARIO 3: in /etc/mysql/conf.d/galera.conf log_bin = /var/lib/mysql/mariadb-bin log_bin_index = /var/lib/mysql/mariadb-bin.index WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20150503 16:13:08.917) WSREP_SST: [INFO] Proceeding with SST (20150503 16:13:09.456) WSREP_SST: [INFO] Cleaning the existing datadir (20150503 16:13:09.462) removed '/var/lib/mysql/gvwstate.dat' WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql /var/lib/mysql as well (20150503 16:13:09.488) find: paths must precede expression: /var/lib/mysql/mariadb-bin\. [0-9] +$ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] WSREP_SST: [ERROR] Cleanup after exit with status:1 (20150503 16:13:09.494) WSREP_SST: [INFO] Removing the sst_in_progress file (20150503 16:13:09.498) 150503 16:13:09 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.167.8' --auth 'xtra:blankPencils10' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '17625' --binlog '/var/lib/mysql/mariadb-bin' : 1 (Operation not permitted) 150503 16:13:09 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 150503 16:13:09 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 150503 16:13:09 [ERROR] Aborting
              Hide
              joffrey Joffrey MICHAIE added a comment -

              Hi,

              problem comes from following snippet in xtrabackup-v2:

              /usr/bin/xtrabackup-v2
              873             for bfiles in $(ls -1 ${BINLOG_FILENAME}.*);do
              874                 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index
              875             done
              

              When commenting this code, the node is starting correctly.

              I think regression comes from https://mariadb.atlassian.net/browse/MDEV-6391 patch, which was included in 10.0.17.

              I am not sure if we have access to xtrabackup-v2 code. Only binary logs (files ending with numbers) should be put in the index file:

              Proposed "Fix":

              /usr/bin/xtrabackup-v2
              873             for bfiles in $(ls -1 ${BINLOG_FILENAME}.[0-9]+);do
              874                 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index
              875             done
              

              Joffrey

              Show
              joffrey Joffrey MICHAIE added a comment - Hi, problem comes from following snippet in xtrabackup-v2: /usr/bin/xtrabackup-v2 873 for bfiles in $(ls -1 ${BINLOG_FILENAME}.*); do 874 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index 875 done When commenting this code, the node is starting correctly. I think regression comes from https://mariadb.atlassian.net/browse/MDEV-6391 patch, which was included in 10.0.17. I am not sure if we have access to xtrabackup-v2 code. Only binary logs (files ending with numbers) should be put in the index file: Proposed "Fix": /usr/bin/xtrabackup-v2 873 for bfiles in $(ls -1 ${BINLOG_FILENAME}.[0-9]+); do 874 echo ${BINLOG_DIRNAME}/${bfiles} >> ${BINLOG_FILENAME}.index 875 done Joffrey
              Hide
              nirbhay_c Nirbhay Choubey added a comment -

              Joffrey MICHAIE I can't reproduce the scenario where state file ends up in index file on joiner node.
              The reason possibly being mariadb deletes state file during startup and later recreates it
              on normal shutdown to recover GTID state. So, during SST the state file should not exist
              on donor node in first place.

              This issue, however, seem related to the missing log_bin_basename variable in 10.0.
              PXB uses log_bin_basename to determine binary log's base directory.

              ^Gxbstream: Can't get stat of './mysql.000003' (Errcode: 2 - No such file or directory)
              innobackupex: 'xbstream -c' returned with exit code 1.
              innobackupex: Ignoring nonexistent file '/home/nirbhay/project/git-repo/maria/10.0-galera/install/data1//mysql.000003'.
              

              https://github.com/percona/percona-xtrabackup/blob/2.2/storage/innobase/xtrabackup/innobackupex.pl#L3225,L3233

              Without this variable, binary log directory outside data directory will simply not work.

              BTW, I have already ported these missing variables to 10.1 (MDEV-7110). But I am not
              sure if its safe to backport it to 10.0 given its GA.

              Show
              nirbhay_c Nirbhay Choubey added a comment - Joffrey MICHAIE I can't reproduce the scenario where state file ends up in index file on joiner node. The reason possibly being mariadb deletes state file during startup and later recreates it on normal shutdown to recover GTID state. So, during SST the state file should not exist on donor node in first place. This issue, however, seem related to the missing log_bin_basename variable in 10.0. PXB uses log_bin_basename to determine binary log's base directory. ^Gxbstream: Can't get stat of './mysql.000003' (Errcode: 2 - No such file or directory) innobackupex: 'xbstream -c' returned with exit code 1. innobackupex: Ignoring nonexistent file '/home/nirbhay/project/git-repo/maria/10.0-galera/install/data1 //mysql.000003'. https://github.com/percona/percona-xtrabackup/blob/2.2/storage/innobase/xtrabackup/innobackupex.pl#L3225,L3233 Without this variable, binary log directory outside data directory will simply not work. BTW, I have already ported these missing variables to 10.1 ( MDEV-7110 ). But I am not sure if its safe to backport it to 10.0 given its GA.
              Hide
              nirbhay_c Nirbhay Choubey added a comment -

              Ok, so its the state file left on the joiner node that's causing the problem.

              Show
              nirbhay_c Nirbhay Choubey added a comment - Ok, so its the state file left on the joiner node that's causing the problem.
              Show
              nirbhay_c Nirbhay Choubey added a comment - https://github.com/MariaDB/server/commit/5467b12dc3894c36eba092430a785806e0d008c9

                People

                • Assignee:
                  nirbhay_c Nirbhay Choubey
                  Reporter:
                  aftab.khan aftab khan
                • Votes:
                  5 Vote for this issue
                  Watchers:
                  7 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: