Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4452

Problem with FederatedX between two local MariaDB servers

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 5.5.30, 5.5.40, 10.0.14
    • Fix Version/s: 10.0
    • Component/s: None
    • Labels:
    • Environment:
      CentOS 5.0 64-bit on VMWare ESxi 5

      Description

      I spoke with Sergei at Percona about this issue. I have two MariaDB 5.5.30 (on CentOS 5) servers set up as master-slave (although that should matter, as this is not a replication issue).

      Here is my setup. On the slave, I have a federated table linking back to the master. Also on the slave, I have a batch reporting process that inserts records into a local (slave) table which matches the table definition of the table on the master. When the process is complete, I do a bulk insert like this:

      insert into <federated_table> select * from <local_table>;

      The records are then loaded into the server table, which then flow back to the slave (as well as other slaves) through normal replication.

      The problem is that one time in about 20 (about 5% of the time), I receive the following error on the slave:

      Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED

      Of course, the master is running fine and never hangs or "goes away". When this error occurs, I have my slave reporting code immediately retry the insert query, and it usually works without failure the second time. The two servers are right next to each other and operate on an internal network (10.0.0.x) with just one HP gigabit switch between them. They are both VMWare VMs running in VMWare ESXi 5.

      The master server is a development server, so it is not under any load or contention.

      I'd like to try to find out some more detailed error logging on the slave when this happens with Federated tables or what could be causing this issue.

      Many thanks.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              heskin Hank Eskin added a comment - - edited

              Just to be clear, there are four tables:

              Master: results
              Slave: results (the slave's replicated version of "results" on the master)
              Slave: results_local (a slave local table used for reporting output)
              Slave results_fed (federated table linking to table "results" on the master)

              The reporting process populates "results_local" on the slave. The problematic insert statement on the slave is:

              insert into results_fed select * from results_local;

              Hope this clears things up.

              Show
              heskin Hank Eskin added a comment - - edited Just to be clear, there are four tables: Master: results Slave: results (the slave's replicated version of "results" on the master) Slave: results_local (a slave local table used for reporting output) Slave results_fed (federated table linking to table "results" on the master) The reporting process populates "results_local" on the slave. The problematic insert statement on the slave is: insert into results_fed select * from results_local; Hope this clears things up.
              Hide
              elenst Elena Stepanova added a comment -

              Please ignore my previous question if you received it (deleted now), the summary of the report already answers it.

              Show
              elenst Elena Stepanova added a comment - Please ignore my previous question if you received it (deleted now), the summary of the report already answers it.
              Hide
              heskin Hank Eskin added a comment -

              I would like to add that along the same lines, I am also experiencing another similar problem with FEDERATEDX. I have code that deletes all records for a primary key in a federated table (between the same two servers mentioned above), and then inserts new records replacing those deleted records.

              Again, about 5% of the time, I will get an error "Can't write; duplicate key in table <table>' when I know for sure that there are absolutely no duplicate records in the existing table or the source records. The source records are coming from another table with the same primary key, so there can not be duplicate records. If I try the same insert again, it usually works. So something on the original table side is triggering this error, but the Federated engine can't interpret and transmit the error correctly to the federated table database.

              Just to eliminate the delete, in my code I now delete the primary key records on the "master" table directly (not through FEDERATED), and then insert into the federated table and the same problem occurs. So the problem is definitely on the federated table insert.

              Show
              heskin Hank Eskin added a comment - I would like to add that along the same lines, I am also experiencing another similar problem with FEDERATEDX. I have code that deletes all records for a primary key in a federated table (between the same two servers mentioned above), and then inserts new records replacing those deleted records. Again, about 5% of the time, I will get an error "Can't write; duplicate key in table <table>' when I know for sure that there are absolutely no duplicate records in the existing table or the source records. The source records are coming from another table with the same primary key, so there can not be duplicate records. If I try the same insert again, it usually works. So something on the original table side is triggering this error, but the Federated engine can't interpret and transmit the error correctly to the federated table database. Just to eliminate the delete, in my code I now delete the primary key records on the "master" table directly (not through FEDERATED), and then insert into the federated table and the same problem occurs. So the problem is definitely on the federated table insert.
              Hide
              elenst Elena Stepanova added a comment -

              Eureka (18 months too late).

              Since we are talking about some regular analytic/reporting activity, it is probably scheduled and isn't very frequent. Then, it might be that the batch reporting process which does local inserts runs at such time when it finishes close to the moment when the existing federated connection is about to timeout. Sometimes (like, in 5% cases) the process takes a bit longer and the federated connection actually times out. And then we hit the problem – when the remote connection disappeared, the next INSERT ends with either "server has gone away" or with "error reading communication packets", see the test case below.

              Test case
              source suite/federated/have_federatedx.inc;
              
              create table t_remote (a int primary key, b varchar(8));
              
              eval create table t_local (a int primary key, b varchar(8))
                engine=federated
                connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/t_remote';
              
              set @wait_timeout.saved = @@global.wait_timeout;
              set global wait_timeout = 1;
              set wait_timeout = 28800;
              
              echo # This one should work;
              insert into t_local values (1,'foo');
              
              sleep 2;
              --error ER_GET_ERRMSG,ER_NET_READ_ERROR
              insert into t_local values (2,'bar');
              echo # Got error $mysql_errno ($mysql_errname)
              echo # This should work again;
              insert into t_local values (2,'bar');
              
              drop table t_remote, t_local;
              
              set global wait_timeout = @wait_timeout.saved;
              
              Show
              elenst Elena Stepanova added a comment - Eureka (18 months too late). Since we are talking about some regular analytic/reporting activity, it is probably scheduled and isn't very frequent. Then, it might be that the batch reporting process which does local inserts runs at such time when it finishes close to the moment when the existing federated connection is about to timeout. Sometimes (like, in 5% cases) the process takes a bit longer and the federated connection actually times out. And then we hit the problem – when the remote connection disappeared, the next INSERT ends with either "server has gone away" or with "error reading communication packets", see the test case below. Test case source suite/federated/have_federatedx.inc; create table t_remote (a int primary key, b varchar(8)); eval create table t_local (a int primary key, b varchar(8)) engine=federated connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/t_remote'; set @wait_timeout.saved = @@global.wait_timeout; set global wait_timeout = 1; set wait_timeout = 28800; echo # This one should work; insert into t_local values (1,'foo'); sleep 2; --error ER_GET_ERRMSG,ER_NET_READ_ERROR insert into t_local values (2,'bar'); echo # Got error $mysql_errno ($mysql_errname) echo # This should work again; insert into t_local values (2,'bar'); drop table t_remote, t_local; set global wait_timeout = @wait_timeout.saved;

                People

                • Assignee:
                  serg Sergei Golubchik
                  Reporter:
                  heskin Hank Eskin
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated: