Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.0.4, 5.5.33a
    • Fix Version/s: N/A
    • Component/s: OTHER
    • Labels:
      None
    • Environment:
      Debian Squeeze

      Description

      Facing the same MDEV-4452 issue form a trigger that is triggered form a replication .

      sur MYSQL1 :

      répli "ccmstats_lucifer"

      Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard03'. Query: 'replace into `ccmstats_shard03`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(3363428860,'2013-10-19 22:27:41','/download/start/descargar-14103-driver-de-video-de-lenovo-ibm-thinkpad-t30','','es.kioskea.net','http://static.ak.facebook.com/connect/xd_arbiter.php?version=27',9175071638627673410)'

      répli "ccmstats_mysql1"

      Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard07'. Query: 'replace into `ccmstats_shard07`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1947240397,'2013-10-19 09:33:46','/sites/details/1089563.jjwxc.net','','www.commentcamarche.net','http://www.quanneiren.com/seo/?page=1068&url=1089563.jjwxc.net',2180121150729504318)'

      sur LUCIFER :

      répli "ccmstats_gertrude"

      Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard13'. Query: 'replace into `ccmstats_shard13`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1323859475,'2013-10-19 11:47:46','/forum/affich-1573832-pourquoi-mon-timer-ne-s-execute-pas','','codes-sources.commentcamarche.net','http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fcodes-sources.commentcamarche.net%2Fforum%2Faffich-1573832-pourquoi-mon-timer-ne-s-execute-pas&ei=NVViUoq0PMa90QXl9ICACQ&usg=AFQjCNH10nmC3K-H9cMsSbl1rj7V7M_V7Q&bv',2736937289062304772)'

      répli "ccmstats_lucifer"

      Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard10'. Query: 'replace into `ccmstats_shard10`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(2903347022,'2013-10-19 09:27:37','/news/12118-firefox-la-nuova-versione-nel-play-store','','it.kioskea.net','http://184.84.222.35/news/12118-Firefox%2C+la+nuova+versione+nel+Play+Store',9684943911008351985)'

      sur GERTRUDE :

      Error 'Got timeout reading communication packets' on query. Default database: 'ccmstats_shard24'. Query: 'replace into `ccmstats_shard24`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(3192578384,'2013-10-19 04:04:09','/download/start/descargar-16307-driver-de-audio-de-placa-base-pcchips-p27g','','es.kioskea.net','http://static.ak.facebook.com/connect/xd_arbiter.php?version=27',6488389282267718615)'

      The master is 5.5.33a

      CREATE TABLE `domaine_federated` (
      `domaine` varchar(128) DEFAULT NULL,
      `idDomaine` tinyint(4) NOT NULL AUTO_INCREMENT,
      PRIMARY KEY (`idDomaine`),
      UNIQUE KEY `idx_domaine` (`domaine`)

      ) ENGINE=federated CONNECTION='PUMA/domaine'
      DEFAULT CHARSET=latin1 ;

      CREATE TABLE `url_federated` (
      `url` char(255) DEFAULT NULL,
      `idUrl` int(10) unsigned NOT NULL AUTO_INCREMENT,
      PRIMARY KEY (`idUrl`),
      UNIQUE KEY `idx_url` (`url`)
      ) ENGINE=federated CONNECTION='PUMA/url'
      DEFAULT CHARSET=latin1 ;

      The slave have a before insert trigger

      define like this

      CREATE TABLE `ccmreferers` (
      `ip` int(4) unsigned NOT NULL,
      `date` datetime NOT NULL,
      `firstseenon` varchar(255) NOT NULL,
      `keyword` varchar(128) NOT NULL,
      `domaine` varchar(128) NOT NULL,
      `referer` varchar(255) NOT NULL,
      `keyword_crc64` bigint(20) unsigned NOT NULL,
      PRIMARY KEY (`ip`,`keyword`),
      KEY `date` (`date`),
      KEY `referer` (`referer`,`firstseenon`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

      DECLARE l_idUrl INT unsigned DEFAULT 0;
      DECLARE i_idDom TINYINT unsigned DEFAULT 0;
      DECLARE c_kw VARCHAR(128);
      IF NOT EXISTS (SELECT 1 FROM ccmreferers
      WHERE keyword = NEW.keyword AND ip = NEW.ip)
      THEN
      SET l_idUrl = ccmstats.GetIdUrl(NEW.firstseenon);
      SET i_idDom = ccmstats.GetIdDomaine(NEW.domaine);
      INSERT INTO stats_url_cur
      SET keyword_crc64 = NEW.keyword_crc64,
      DATE = NEW.date,
      idUrl = l_idUrl,
      idDomaine = i_idDom,
      nb = 1
      ON DUPLICATE KEY UPDATE nb=nb+1;
      IF LENGTH(NEW.keyword) > 0 THEN
      SET c_kw = REPLACE(TRIM(NEW.keyword),' ',' ');
      INSERT INTO stats_url_kw_cur
      SET keyword_crc64 = NEW.keyword_crc64,
      DATE = NEW.date,
      idUrl = l_idUrl,
      keyword = c_kw,
      idDomaine = i_idDom,
      nb = 1
      ON DUPLICATE KEY UPDATE nb=nb+1;
      END IF;
      END IF;

      delimiter //
      drop function GetIdUrl //

      CREATE DEFINER=`root`@`%` FUNCTION `GetIdUrl`(l_strUrl varchar(255)) RETURNS int(11)
      READS SQL DATA
      BEGIN
      DECLARE l_IdUrl int default 0;
      SELECT IdUrl into l_IdUrl FROM `url` where Url = l_strUrl;
      IF l_IdUrl=0 then
      INSERT IGNORE INTO url_federated (Url) VALUES (l_strUrl);
      SELECT IdUrl into l_IdUrl FROM `url_federated` where Url = l_strUrl ;
      END IF;
      RETURN l_IdUrl;
      END;
      //

      delimiter //
      drop function GetIdDomaine //

      CREATE DEFINER=`root`@`%` FUNCTION `GetIdDomaine`(l_strDomaine char(50)) RETURNS int(11)
      READS SQL DATA
      BEGIN
      DECLARE l_IdDomaine int default 0;
      SELECT IdDomaine into l_IdDomaine FROM domaine where Domaine = l_strDomaine;
      IF l_IdDomaine=0 THEN
      INSERT IGNORE INTO domaine_federated (Domaine) VALUES (l_strDomaine);

      SELECT IdDomaine into l_IdDomaine FROM domaine_federated where Domaine = l_strDomaine;

      END IF;
      RETURN l_IdDomaine;
      END;
      //

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              stephane@skysql.com VAROQUI Stephane added a comment -

              Helena , you can notice that all slaves are pointing to the same remote table that slaves just break randomly at no predefine time . We have set a cron to select from the federated table to be able to produce other activity on the remote table .

              That cron also get the same error .

              Notice as well that the time that each slave breaks is very random

              Just restarting the replication works so replaying the same query at a later time is fine

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - Helena , you can notice that all slaves are pointing to the same remote table that slaves just break randomly at no predefine time . We have set a cron to select from the federated table to be able to produce other activity on the remote table . That cron also get the same error . Notice as well that the time that each slave breaks is very random Just restarting the replication works so replaying the same query at a later time is fine
              Hide
              stephane@skysql.com VAROQUI Stephane added a comment -

              Server that get the physical table is in reality MariaDB 5.5.31

              Threads_cached 49
              Threads_connected 20
              Threads_created 900
              Threads_running 6
              back_log 128

              Uptime =9077854
              Uptime_since_flush_status =9077854

              Variables

              thread_cache_size = 64
              thread_handling = one-thread-per-connection

              net.core.somaxconn = 4096
              open_files_limit =132079

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - Server that get the physical table is in reality MariaDB 5.5.31 Threads_cached 49 Threads_connected 20 Threads_created 900 Threads_running 6 back_log 128 Uptime =9077854 Uptime_since_flush_status =9077854 Variables thread_cache_size = 64 thread_handling = one-thread-per-connection net.core.somaxconn = 4096 open_files_limit =132079
              Hide
              elenst Elena Stepanova added a comment -

              Hi Stephane,

              Do you understand correctly from your comment above that the problem is sporadic, not reliably reproducible with the provided structures and queries?
              If so, was there any particular reason why you chose to file a new bug rather than add a comment to MDEV-4452 which as you said it is a duplicate of?

              Show
              elenst Elena Stepanova added a comment - Hi Stephane, Do you understand correctly from your comment above that the problem is sporadic, not reliably reproducible with the provided structures and queries? If so, was there any particular reason why you chose to file a new bug rather than add a comment to MDEV-4452 which as you said it is a duplicate of?
              Hide
              stephane@skysql.com VAROQUI Stephane added a comment -

              Elena Yes sporadic issue and i don't really know about the policy in case of duplicate issue that does not provide any solution at the end .

              I'm more coming to you for help in a methodology to help founding the cause

              The error clearly state it could be a network issue on the remote server but the applications using that same server does not suffer the same issues. that server that hold the federated table is a master so constantly used and every error from the application is logged into syslog .

              We only have one error per day happening every day at the same time . The client is investigating this. but it does not match timming with our sporadic issue

              codes-sources.commentcamarche.net web18 2013-10-19 04:12:00 /profile/user/Bul3:/var/www/vhosts/www.commentcamarche.net/include/ccmfunctions.php3:737 - Acc?s ? la base

              To come back to network potential isues i have the client to come back to a more conservative tcp setting

              From this setup to
              net.ipv4.tcp_max_syn_backlog = 4096
              net.ipv4.tcp_fin_timeout = 60
              net.ipv4.tcp_tw_reuse = 1

              to this setup
              net.ipv4.tcp_max_syn_backlog = 4096
              net.ipv4.tcp_fin_timeout = 25
              net.ipv4.tcp_tw_reuse = 0

              And we will see....

              On the master looking at the query monitoring we do need see any queries popping at special unexpected response time.
              when errors happened.

              The only suboptimal status are

              Aborted_connects | 157 |
              Aborted_clients | 2932873 |

              If we state the issue is not on the server that hold the table but more on servers that use the federated table

              what we observe is an deadlock on the slave and 30 minutes later the replication is broken
              That's a good hint

              Oct 19 11:21:13 lucifer mysqld: 131019 11:21:13 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'ccmstats_shard10'. Query: 'replace into `ccmstats_shard10`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1315980540,'2013-10-19 11:21:13','/forum/affich-4000625-prime-noel-condition-d-attribution','','droit-finances.commentcamarche.net','https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CFwQFjAE&url=http%3A%2F%2Fdroit-finances.commentcamarche.net%2Fforum%2Faffich-4000625-prime-noel-condition-d-attribution&ei=KE1iUsfIHIeu0QXt5YGYBg&usg=AFQjCNGauJn6T9vDHk4LSLUQOI43iD_NwQ&s',5321267248370821353)', Internal MariaDB error code: 1213

              Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard13'. Query: 'replace into `ccmstats_shard13`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1323859475,'2013-10-19 11:47:46','/forum/affich-1573832-pourquoi-mon-timer-ne-s-execute-pas','','codes-sources.commentcamarche.net','http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fcodes-sources.commentcamarche.net%2Fforum%2Faffich-1573832-pourquoi-mon-timer-ne-s-execute-pas&ei=NVViUoq0PMa90QXl9ICACQ&usg=AFQjCNH10nmC3K-H9cMsSbl1rj7V7M_V7Q&bv',2736937289062304772)', Internal MariaDB error code: 1296
              Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [Warning] Master 'ccmstats_gertrude': Slave: Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED Error_code: 1296
              Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [ERROR] Master 'ccmstats_gertrude': Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000017' position 17754866

              Asking the client to check if this predictable after a deadlock a replication break .

              Now in any case . Looking at federatedX code you can see that federated have auto reconnect flag on . the question is what happen if it failed, does it retry ? and if the query is inside a replication thread so what ? Any way to make this stable dispite in our case i'm really not suspecting a network issue .

              Thanks

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - Elena Yes sporadic issue and i don't really know about the policy in case of duplicate issue that does not provide any solution at the end . I'm more coming to you for help in a methodology to help founding the cause The error clearly state it could be a network issue on the remote server but the applications using that same server does not suffer the same issues. that server that hold the federated table is a master so constantly used and every error from the application is logged into syslog . We only have one error per day happening every day at the same time . The client is investigating this. but it does not match timming with our sporadic issue codes-sources.commentcamarche.net web18 2013-10-19 04:12:00 /profile/user/Bul3:/var/www/vhosts/www.commentcamarche.net/include/ccmfunctions.php3:737 - Acc?s ? la base To come back to network potential isues i have the client to come back to a more conservative tcp setting From this setup to net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 60 net.ipv4.tcp_tw_reuse = 1 to this setup net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.tcp_fin_timeout = 25 net.ipv4.tcp_tw_reuse = 0 And we will see.... On the master looking at the query monitoring we do need see any queries popping at special unexpected response time. when errors happened. The only suboptimal status are Aborted_connects | 157 | Aborted_clients | 2932873 | If we state the issue is not on the server that hold the table but more on servers that use the federated table what we observe is an deadlock on the slave and 30 minutes later the replication is broken That's a good hint Oct 19 11:21:13 lucifer mysqld: 131019 11:21:13 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'ccmstats_shard10'. Query: 'replace into `ccmstats_shard10`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1315980540,'2013-10-19 11:21:13','/forum/affich-4000625-prime-noel-condition-d-attribution','','droit-finances.commentcamarche.net','https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CFwQFjAE&url=http%3A%2F%2Fdroit-finances.commentcamarche.net%2Fforum%2Faffich-4000625-prime-noel-condition-d-attribution&ei=KE1iUsfIHIeu0QXt5YGYBg&usg=AFQjCNGauJn6T9vDHk4LSLUQOI43iD_NwQ&s',5321267248370821353)', Internal MariaDB error code: 1213 Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard13'. Query: 'replace into `ccmstats_shard13`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1323859475,'2013-10-19 11:47:46','/forum/affich-1573832-pourquoi-mon-timer-ne-s-execute-pas','','codes-sources.commentcamarche.net','http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fcodes-sources.commentcamarche.net%2Fforum%2Faffich-1573832-pourquoi-mon-timer-ne-s-execute-pas&ei=NVViUoq0PMa90QXl9ICACQ&usg=AFQjCNH10nmC3K-H9cMsSbl1rj7V7M_V7Q&bv',2736937289062304772)', Internal MariaDB error code: 1296 Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [Warning] Master 'ccmstats_gertrude': Slave: Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED Error_code: 1296 Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [ERROR] Master 'ccmstats_gertrude': Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000017' position 17754866 Asking the client to check if this predictable after a deadlock a replication break . Now in any case . Looking at federatedX code you can see that federated have auto reconnect flag on . the question is what happen if it failed, does it retry ? and if the query is inside a replication thread so what ? Any way to make this stable dispite in our case i'm really not suspecting a network issue . Thanks
              Hide
              stephane@skysql.com VAROQUI Stephane added a comment -

              More input found on some various error log

              We sporadicly get the following error as well on some MariaDB 10 slave that get the federated tables

              mysqldump: Couldn't execute 'FLUSH /*!40101 LOCAL */ TABLES': Got an error writing communication packets (1160)

              It could related to this bug reported on MySQL and to the same issue we are facing

              http://bugs.mysql.com/bug.php?id=66184

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - More input found on some various error log We sporadicly get the following error as well on some MariaDB 10 slave that get the federated tables mysqldump: Couldn't execute 'FLUSH /*!40101 LOCAL */ TABLES': Got an error writing communication packets (1160) It could related to this bug reported on MySQL and to the same issue we are facing http://bugs.mysql.com/bug.php?id=66184
              Hide
              stephane@skysql.com VAROQUI Stephane added a comment -

              We have replace FederatedX with Spider Engine and the issue has show up again but getting replication stopped on different error messages .

              One server have Last_SQL_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'ccmstats_shard12'. Query: 'replace into `ccmstats_shard12`.`ccmreferers

              And an other one have

              Last_SQL_Error: Error 'Remote MySQL server has gone away' on query. Default database: 'ccmstats_shard05'. Query: 'replace into `ccmstats_shard05`.`ccmreferers

              Now one other server is fine and never stopped his replication and the difference is that this server does not have load on it . So this state that the issue is not on the remote server but more caused by activity on the one that his holding the linked table.

              In a more generic way to fixe this we have slave-skip-errors, can we have slave-retry-errors

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - We have replace FederatedX with Spider Engine and the issue has show up again but getting replication stopped on different error messages . One server have Last_SQL_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'ccmstats_shard12'. Query: 'replace into `ccmstats_shard12`.`ccmreferers And an other one have Last_SQL_Error: Error 'Remote MySQL server has gone away' on query. Default database: 'ccmstats_shard05'. Query: 'replace into `ccmstats_shard05`.`ccmreferers Now one other server is fine and never stopped his replication and the difference is that this server does not have load on it . So this state that the issue is not on the remote server but more caused by activity on the one that his holding the linked table. In a more generic way to fixe this we have slave-skip-errors, can we have slave-retry-errors
              Hide
              stephane@skysql.com VAROQUI Stephane added a comment - - edited

              Fixed that was an issue in the trigger code

              Show
              stephane@skysql.com VAROQUI Stephane added a comment - - edited Fixed that was an issue in the trigger code
              Hide
              elenst Elena Stepanova added a comment -

              I added a test case to MDEV-4452, so I'm closing this bug as a duplicate.

              Show
              elenst Elena Stepanova added a comment - I added a test case to MDEV-4452 , so I'm closing this bug as a duplicate.

                People

                • Assignee:
                  elenst Elena Stepanova
                  Reporter:
                  stephane@skysql.com VAROQUI Stephane
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: