Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6155

Upgraded servers from 5.5 to 10 and replication lag keeps growing

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 10.0.10
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:
      CentOS x86_64 (master) and CentOS i386 (slave)

      Description

      I've used MariaDB 5.5 for month with replication without a hiccup. As the upgrade plan sees, I've upgraded the slave to 10 and then the master to 10 also and from that moment I'm seeing constantly increasing slave lag...

      From iotop shows

        TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
        337 be/3 root        0.00 B/s    3.89 K/s  0.00 % 89.96 % [jbd2/md1-8]
       3313 be/4 mysql      11.66 K/s  116.61 K/s  0.00 %  5.59 % mysqld --~-port=3306
      

      and jbd2 clogs completely IO... witch didn't happen...

      Checking process list, and refreshing it constantly, I see constantly Table lock in COMMIT phase...

      +----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+
      | Id | User            | Host            | db      | Command | Time   | State                            | Info             | Progress |
      +----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+
      |  1 | event_scheduler | localhost       | NULL    | Daemon  |   3255 | Waiting on empty queue           | NULL             |    0.000 |
      | 10 | system user     |                 | NULL    | Connect |   1991 | Waiting for master to send event | NULL             |    0.000 |
      | 11 | system user     |                 | MCPC_SI | Connect | 127178 | Table lock                       | COMMIT           |    0.000 |
      | 15 | root            | localhost       | NULL    | Sleep   |      0 |                                  | NULL             |    0.000 |
      | 16 | root            | localhost:41879 | MCPC_SI | Sleep   |     15 |                                  | NULL             |    0.000 |
      | 17 | root            | localhost       | NULL    | Query   |      0 | init                             | SHOW PROCESSLIST |    0.000 |
      +----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+
      

      I've changed from innodb to myisam (it could be a storage engine bottleneck) but without any changes...

      Any clues on what might be (besides the change from 5.5 to 10)?

      Thank you,
      António Fernandes

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            elenst Elena Stepanova added a comment - - edited

            Hi,

            When you said

            I've changed from innodb to myisam (it could be a storage engine bottleneck) but without any changes...

            did you mean that you converted all your existing tables from InnoDB to MyISAM (ran ALTER TABLE .. ENGINE=MyISAM), and still get the same IO problem?
            It's important to know, because there were lots of changes in InnoDB between 5.5 and 5.6/10.0, IO-related included, so we need to know if we can safely ignore them while looking into it.

            Also, did you try to stop mysqld and check that when it's not running, jbd2 doesn't clog the IO? I agree it would be a questionable coincidence that it started doing so exactly when you upgraded from 5.5 to 10.0, but coincidences do happen, so it would be good to rule it out.

            Show
            elenst Elena Stepanova added a comment - - edited Hi, When you said I've changed from innodb to myisam (it could be a storage engine bottleneck) but without any changes... did you mean that you converted all your existing tables from InnoDB to MyISAM (ran ALTER TABLE .. ENGINE=MyISAM), and still get the same IO problem? It's important to know, because there were lots of changes in InnoDB between 5.5 and 5.6/10.0, IO-related included, so we need to know if we can safely ignore them while looking into it. Also, did you try to stop mysqld and check that when it's not running, jbd2 doesn't clog the IO? I agree it would be a questionable coincidence that it started doing so exactly when you upgraded from 5.5 to 10.0, but coincidences do happen, so it would be good to rule it out.
            Hide
            ampf Antonio Fernandes added a comment -

            Hi Elena,

            Just converted some tables. With innotop I've checked that some specific tables were constantly locked. And changed only those (about 3 or 4).

            As for stopping mysqld, the minute I execute STOP SLAVE, the IO goes to "normal" (and also jbd2). I also tried remounting the filesystem (ext4) with noatime and that didn't change visibly the replication performance.

            I can now add some more information based on continuous digging:

            pt-ioprofile
            Wed Apr 23 15:25:48 WEST 2014
            Tracing process ID 1925
                 total       read    pread64   pwrite64      write      fsync    _llseek filename
             28.534776   0.000000   0.000000   0.029010   0.000000  28.505766   0.000000 /var/lib/mysql/ib_logfile1
              0.030686   0.000000   0.000000   0.000000   0.019066   0.000000   0.011620 /var/lib/mysql/relay-log.info
              0.021215   0.008409   0.004655   0.000000   0.000000   0.000000   0.008151 /var/lib/mysql/MCPC_Marcas/tbl_produtos#P#p6.MYD
              0.018563   0.003063   0.001420   0.011352   0.000000   0.000000   0.002728 /var/lib/mysql/MCPC_SI/tbl_clientes.MYD
              0.008746   0.000000   0.000000   0.000000   0.005074   0.000000   0.003672 /var/lib/mysql/master.info
              0.005730   0.000000   0.000000   0.000000   0.005730   0.000000   0.000000 /var/lib/mysql/lpvs2-relay-bin.000031
              0.004688   0.002383   0.000745   0.000000   0.000000   0.000000   0.001560 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYD
              0.002076   0.000000   0.000023   0.002053   0.000000   0.000000   0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes.MYI
              0.001146   0.000000   0.000000   0.001146   0.000000   0.000000   0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYI
              0.000899   0.000899   0.000000   0.000000   0.000000   0.000000   0.000000 /var/lib/mysql/lpvs2-relay-bin.000011
            

            I have a bunch of scripts that imported data from flat files to MariaDB. They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster...

            I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5. Since the slave has older/worst hardware, it lags... as the master keeps eating data as normal...

            Best regards,
            António Fernandes

            Show
            ampf Antonio Fernandes added a comment - Hi Elena, Just converted some tables. With innotop I've checked that some specific tables were constantly locked. And changed only those (about 3 or 4). As for stopping mysqld, the minute I execute STOP SLAVE, the IO goes to "normal" (and also jbd2). I also tried remounting the filesystem (ext4) with noatime and that didn't change visibly the replication performance. I can now add some more information based on continuous digging: pt-ioprofile Wed Apr 23 15:25:48 WEST 2014 Tracing process ID 1925 total read pread64 pwrite64 write fsync _llseek filename 28.534776 0.000000 0.000000 0.029010 0.000000 28.505766 0.000000 /var/lib/mysql/ib_logfile1 0.030686 0.000000 0.000000 0.000000 0.019066 0.000000 0.011620 /var/lib/mysql/relay-log.info 0.021215 0.008409 0.004655 0.000000 0.000000 0.000000 0.008151 /var/lib/mysql/MCPC_Marcas/tbl_produtos#P#p6.MYD 0.018563 0.003063 0.001420 0.011352 0.000000 0.000000 0.002728 /var/lib/mysql/MCPC_SI/tbl_clientes.MYD 0.008746 0.000000 0.000000 0.000000 0.005074 0.000000 0.003672 /var/lib/mysql/master.info 0.005730 0.000000 0.000000 0.000000 0.005730 0.000000 0.000000 /var/lib/mysql/lpvs2-relay-bin.000031 0.004688 0.002383 0.000745 0.000000 0.000000 0.000000 0.001560 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYD 0.002076 0.000000 0.000023 0.002053 0.000000 0.000000 0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes.MYI 0.001146 0.000000 0.000000 0.001146 0.000000 0.000000 0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYI 0.000899 0.000899 0.000000 0.000000 0.000000 0.000000 0.000000 /var/lib/mysql/lpvs2-relay-bin.000011 I have a bunch of scripts that imported data from flat files to MariaDB. They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster... I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5. Since the slave has older/worst hardware, it lags... as the master keeps eating data as normal... Best regards, António Fernandes
            Hide
            elenst Elena Stepanova added a comment -

            They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster...

            I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5

            Indeed, there were many changes in InnoDB which dealt with IO between 5.5 and 5.6/10.0, and since the transactional approach is in favor, the single-statement transactions might require more tuning. What's weird though is that you observed it on MyISAM tables – it shouldn't matter for them whether you do inserts as big transactions or single statements. Maybe it so happened that you converted different tables though, not those which were inserted into.

            Given that we are still talking about InnoDB, you could also consider adjusting innodb_flush_log_at_trx_commit ( https://mariadb.com/kb/en/xtradbinnodb-server-system-variables/#innodb_flush_log_at_trx_commit ). The default value 1 is the safest, but since you are tuning a slave, you might be able to afford using a different value. Although, if you have already switched to big transactions, it probably won't make much difference.

            There are other ways to tune InnoDB-related IO, see http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-diskio.html .

            Show
            elenst Elena Stepanova added a comment - They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster... I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5 Indeed, there were many changes in InnoDB which dealt with IO between 5.5 and 5.6/10.0, and since the transactional approach is in favor, the single-statement transactions might require more tuning. What's weird though is that you observed it on MyISAM tables – it shouldn't matter for them whether you do inserts as big transactions or single statements. Maybe it so happened that you converted different tables though, not those which were inserted into. Given that we are still talking about InnoDB, you could also consider adjusting innodb_flush_log_at_trx_commit ( https://mariadb.com/kb/en/xtradbinnodb-server-system-variables/#innodb_flush_log_at_trx_commit ). The default value 1 is the safest, but since you are tuning a slave, you might be able to afford using a different value. Although, if you have already switched to big transactions, it probably won't make much difference. There are other ways to tune InnoDB-related IO, see http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-diskio.html .
            Hide
            ampf Antonio Fernandes added a comment -

            Hi,

            Change innodb_flush_log_at_trx_commit to 0 did in fact reduce IO (but it can add inconsistency between slave and master in the event of a crash):

            Total DISK READ: 453.38 K/s | Total DISK WRITE: 782.76 K/s
              TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
              337 be/3 root        0.00 B/s    0.00 B/s  0.00 % 22.65 % [jbd2/md1-8]
             1972 be/4 mysql       0.00 B/s  616.13 K/s  0.00 %  0.61 % mysqld --~-port=3306
             1970 be/4 mysql       0.00 B/s   89.13 K/s  0.00 %  0.12 % mysqld --~-port=3306
             1938 be/4 mysql       0.00 B/s    0.00 B/s  0.00 %  0.09 % mysqld --~-port=3306
             1939 be/4 mysql       0.00 B/s    0.00 B/s  0.00 %  0.04 % mysqld --~-port=3306
             1980 be/4 mysql     449.50 K/s    0.00 B/s  0.00 %  0.00 % mysqld --~-port=3306
            

            Now it's CPU bounded... but that's OK since it's catching up.

            As for InnoDB vs MyISAM, i'll continue my tests and post then here.

            Best regards,
            António Fernandes

            Show
            ampf Antonio Fernandes added a comment - Hi, Change innodb_flush_log_at_trx_commit to 0 did in fact reduce IO (but it can add inconsistency between slave and master in the event of a crash): Total DISK READ: 453.38 K/s | Total DISK WRITE: 782.76 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 337 be/3 root 0.00 B/s 0.00 B/s 0.00 % 22.65 % [jbd2/md1-8] 1972 be/4 mysql 0.00 B/s 616.13 K/s 0.00 % 0.61 % mysqld --~-port=3306 1970 be/4 mysql 0.00 B/s 89.13 K/s 0.00 % 0.12 % mysqld --~-port=3306 1938 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.09 % mysqld --~-port=3306 1939 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 0.04 % mysqld --~-port=3306 1980 be/4 mysql 449.50 K/s 0.00 B/s 0.00 % 0.00 % mysqld --~-port=3306 Now it's CPU bounded... but that's OK since it's catching up. As for InnoDB vs MyISAM, i'll continue my tests and post then here. Best regards, António Fernandes
            Hide
            ampf Antonio Fernandes added a comment -

            Hi,

            I'm starting to monitor the 2 servers with percona-cacti-templates (never had the need ). Started today with our master server. If all goes well, tomorrow I'll roll out to the problematic slave.

            BTW is there any procedure to make MyISAM RW bechmark between 5.5 and 10?

            Best regards,
            Antonio

            Show
            ampf Antonio Fernandes added a comment - Hi, I'm starting to monitor the 2 servers with percona-cacti-templates (never had the need ). Started today with our master server. If all goes well, tomorrow I'll roll out to the problematic slave. BTW is there any procedure to make MyISAM RW bechmark between 5.5 and 10? Best regards, Antonio
            Hide
            elenst Elena Stepanova added a comment -

            I'm not sure what you mean by a procedure.
            If you need a benchmarking tool, sysbench is widely used in MySQL world, although there are others. Alternatively, especially if the existing tools don't represent your workflow well enough, you can create and run your own flow and run it against 5.5 and 10.0.

            Show
            elenst Elena Stepanova added a comment - I'm not sure what you mean by a procedure. If you need a benchmarking tool, sysbench is widely used in MySQL world, although there are others. Alternatively, especially if the existing tools don't represent your workflow well enough, you can create and run your own flow and run it against 5.5 and 10.0.
            Hide
            ampf Antonio Fernandes added a comment -

            Hello Elena,

            I couldn't find time to really test this thoroughly

            Do you know if anyone measured single-threaded table lock/unlock performance for MyISAM between 5.5 and 10?

            If not, you could close the ticket because I did workaround the problem (don't replicate 6.2 million MyISAM records that are updated every night).

            Regards,
            António

            Show
            ampf Antonio Fernandes added a comment - Hello Elena, I couldn't find time to really test this thoroughly Do you know if anyone measured single-threaded table lock/unlock performance for MyISAM between 5.5 and 10? If not, you could close the ticket because I did workaround the problem (don't replicate 6.2 million MyISAM records that are updated every night). Regards, António
            Hide
            elenst Elena Stepanova added a comment -

            I haven't heard of this specific benchmark being done.

            As discussed above, closing for now. If you have more information that needs to be looked at, please comment to re-open the report.

            Show
            elenst Elena Stepanova added a comment - I haven't heard of this specific benchmark being done. As discussed above, closing for now. If you have more information that needs to be looked at, please comment to re-open the report.

              People

              • Assignee:
                Unassigned
                Reporter:
                ampf Antonio Fernandes
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: