Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8571

After mysqloptimize sometimes one of the tables is marked as crashed

    Details

    • Type: Bug
    • Status: Confirmed
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0
    • Fix Version/s: 10.1, 10.0
    • Component/s: Admin statements
    • Labels:
      None
    • Environment:
      Linux

      Description

      Sometimes at nightly cron-jobs doing "mysqloptimize" one of the tables get marked "crashed" and automatic repair fails.
      The last one is caused by related bug.

      I tried to force the error by stressing the database with inserts/reads during mysqloptimize, but I was unable to get it crashed at the moment. But it happens sometimes at night (when the cron job runs).
      The script which consists of
      mysqlcheck -s -A
      mysqloptimize -s -A
      mysqlanalyze -s -A

      is running for years without any problem. The problem first occured after update from MariaDB 5.5 to 10.0.19.
      If this is relevant: all tables have Aria engine.

      I still expect some racing condition - as a "user" I'm not familiar when a table is marked "crashed" - but there must be some condition when some lock, or gathering the lock interfears with this state after lock is removed.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              elenst Elena Stepanova added a comment -

              Fustini,

              Could you please paste (or upload to ftp.askmonty.org/private) the output of

              SHOW CREATE TABLE rkt27_k2_items;
              SHOW INDEX IN rkt27_k2_items;
              SHOW TABLE STATUS LIKE 'rkt27_k2_items';
              

              The request for my.cnf is also still valid.

              It's weird that the process disappears without any sign of what happened to it either in its error log, or in system logs...
              Marc,
              Is it the same for you? The process also disappears silently?

              Show
              elenst Elena Stepanova added a comment - Fustini , Could you please paste (or upload to ftp.askmonty.org/private) the output of SHOW CREATE TABLE rkt27_k2_items; SHOW INDEX IN rkt27_k2_items; SHOW TABLE STATUS LIKE 'rkt27_k2_items'; The request for my.cnf is also still valid. It's weird that the process disappears without any sign of what happened to it either in its error log, or in system logs... Marc , Is it the same for you? The process also disappears silently?
              Hide
              Alessandro Fustini added a comment -

              Hi Elena,
              I copied in ftp data_table_mariadb.txt all the data
              Thank you

              Show
              Alessandro Fustini added a comment - Hi Elena, I copied in ftp data_table_mariadb.txt all the data Thank you
              Hide
              mokraemer Marc added a comment - - edited

              Hi,
              sorry for the late reply (was on holiday).
              In my case, the process does NOT disappear. The only thing I can see as related might be the log rotation. I use the following script:

              /var/log/mysqld/*.log
                  rotate 5
                  weekly
                  missingok
                  notifempty
                  compress
                  create 644 mysql mysql
                  sharedscripts
                  prerotate
                   mysqladmin flush-logs 
                  endscript
                  postrotate
                      mysqladmin flush-logs
                  endscript
              

              but this did not cause a process reload, which I'm monitoring & checked the logs

              Show
              mokraemer Marc added a comment - - edited Hi, sorry for the late reply (was on holiday). In my case, the process does NOT disappear. The only thing I can see as related might be the log rotation. I use the following script: /var/log/mysqld/*.log rotate 5 weekly missingok notifempty compress create 644 mysql mysql sharedscripts prerotate mysqladmin flush-logs endscript postrotate mysqladmin flush-logs endscript but this did not cause a process reload, which I'm monitoring & checked the logs
              Hide
              elenst Elena Stepanova added a comment -

              Fustini, Marc, thanks, it seems I am now able to reproduce the problem with a concurrent test.


              Sergey Vojtovich (or whoever ends up fixing it),
              below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it.

              RQG grammar mdev8571.yy
              query_init:
              	CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1;
              
              query:
              	REPLACE INTO my_table () VALUES (),();
              
              thread1:
              	OPTIMIZE TABLE my_table;
              
              my_table:
              	t1 | t2;
              
              RQG command line
              perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir> 
              

              When the problem is hit, the output looks like this:

              # 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test
              # 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds.
              # 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started
              # 2015-09-17T03:24:40 [25179] Started periodic reporting process...
              # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
              # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed
              ...
              

              It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line).
              The printed error is not the problem in itself, it's just a side effect (see MDEV-8475). The real problem is that at some point a TMM file does not get removed.
              There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated.

              I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because MDEV-8475 has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.

              Show
              elenst Elena Stepanova added a comment - Fustini , Marc , thanks, it seems I am now able to reproduce the problem with a concurrent test. Sergey Vojtovich (or whoever ends up fixing it), below is the RQG test. Use lp:~elenst/randgen/mariadb-patches to run it. RQG grammar mdev8571.yy query_init: CREATE TABLE IF NOT EXISTS t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; CREATE TABLE IF NOT EXISTS t2 LIKE t1; query: REPLACE INTO my_table () VALUES (),(); thread1: OPTIMIZE TABLE my_table; my_table: t1 | t2; RQG command line perl ./runall-new.pl --grammar=mdev8571.yy --skip-gendata --threads=2 --queries=100M --duration=120 --mysqld=--wait-timeout=10 --basedir=<10.0 basedir> --vardir=<your vardir> When the problem is hit, the output looks like this: # 2015-09-17T03:24:39 [24975] Caching schema metadata for dbi:mysql:host=127.0.0.1:port=18570:user=rqg:database=test # 2015-09-17T03:24:40 [24975] Starting 2 processes, 100000000 queries each, duration 120 seconds. # 2015-09-17T03:24:40 [24975] GenTest::ErrorFilter(25178) started # 2015-09-17T03:24:40 [25179] Started periodic reporting process... # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed # 2015-09-17T03:24:53 [25182] Query: REPLACE INTO t1 () VALUES (),() failed: 144 Table './test/t1' is marked as crashed and last (automatic?) repair failed ... It should happen in ~10 seconds after 'Started reporting process' (or in wait_timeout seconds, if one changes the value provided on the command line). The printed error is not the problem in itself, it's just a side effect (see MDEV-8475 ). The real problem is that at some point a TMM file does not get removed. There is no visible problem with OPTIMIZE, so I'm not sure what makes it bypass the file removal, it is to be investigated. I could not reproduce it on 5.5 (which explains why reporters reported a regression), and it cannot be seen with this test on 10.1 because MDEV-8475 has been fixed there, and the error on REPLACE does not happen, even though the underlying issue might still exist. All in all, I suggest to investigate it on 10.0 and then check if other versions are affected.
              Hide
              mokraemer Marc added a comment -

              wow finally we got a test-case - thanks Elena!

              Show
              mokraemer Marc added a comment - wow finally we got a test-case - thanks Elena!

                People

                • Assignee:
                  svoj Sergey Vojtovich
                  Reporter:
                  mokraemer Marc
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated: