Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 10.0.17-galera
-
Fix Version/s: None
-
Component/s: wsrep
-
Labels:
Description
When replicating between two Galera clusters using GTID replication, duplicate DDL does not get ignored, even with gtid_ignore_duplicates turned on, if each node has its own server_id.
Let's say we have 2 clusters, each with 3 nodes.
Cluster 1, node1:
MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_ready | ON |
+----------------------+---------+
4 rows in set (0.01 sec)
Cluster 2, node1:
MariaDB [(none)]> SHOW STATUS WHERE Variable_name IN('wsrep_connected', 'wsrep_cluster_status', 'wsrep_cluster_size', 'wsrep_ready');
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| wsrep_cluster_size | 3 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_ready | ON |
+----------------------+---------+
4 rows in set (0.00 sec)
Lets view the current GTID positions of each.
Cluster 1, node 1:
MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000002
Position: 787
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 787);
+--------------------------------------------+
| BINLOG_GTID_POS("mariadb-bin.000002", 787) |
+--------------------------------------------+
| 21-1-3,22-5-1 |
+--------------------------------------------+
1 row in set (0.00 sec)
Cluster 2, node 1:
MariaDB [(none)]> SHOW MASTER STATUS\G
*************************** 1. row ***************************
File: mariadb-bin.000002
Position: 551
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT BINLOG_GTID_POS("mariadb-bin.000002", 551);
+--------------------------------------------+
| BINLOG_GTID_POS("mariadb-bin.000002", 551) |
+--------------------------------------------+
| 21-1-3,22-5-1 |
+--------------------------------------------+
1 row in set (0.00 sec)
Now let's set up replication between each cluster:
Cluster 1, node 1:
MariaDB [(none)]> STOP SLAVE; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> RESET SLAVE; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1'; Query OK, 0 rows affected (0.11 sec) MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.55", master_use_gtid=current_pos, master_user='sst', master_password='sst'; Query OK, 0 rows affected (0.02 sec) MariaDB [(none)]> START SLAVE; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.55 Master_User: sst Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000002 Read_Master_Log_Pos: 551 Relay_Log_File: localhost-relay-bin.000002 Relay_Log_Pos: 697 Relay_Master_Log_File: mariadb-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 551 Relay_Log_Space: 998 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 4 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Current_Pos Gtid_IO_Pos: 21-1-3,22-5-1 1 row in set (0.00 sec)
Cluster 2, node 1:
MariaDB [(none)]> STOP SLAVE; Query OK, 0 rows affected (0.03 sec) MariaDB [(none)]> RESET SLAVE; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SET GLOBAL GTID_SLAVE_POS='21-1-3,22-5-1'; Query OK, 0 rows affected (0.03 sec) MariaDB [(none)]> CHANGE MASTER TO master_host="192.168.1.52", master_use_gtid=current_pos, master_user='sst', master_password='sst'; Query OK, 0 rows affected (0.02 sec) MariaDB [(none)]> START SLAVE; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.52 Master_User: sst Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mariadb-bin.000002 Read_Master_Log_Pos: 787 Relay_Log_File: localhost-relay-bin.000002 Relay_Log_Pos: 697 Relay_Master_Log_File: mariadb-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 787 Relay_Log_Space: 998 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Current_Pos Gtid_IO_Pos: 21-1-3,22-5-1 1 row in set (0.00 sec)
Now let's create a database on cluster 1.
Cluster 1, node 1:
MariaDB [(none)]> CREATE DATABASE db1; Query OK, 1 row affected (0.02 sec) MariaDB [(none)]> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | db1 | | information_schema | | mysql | | performance_schema | | test | +--------------------+ 5 rows in set (0.00 sec)
So far, everything looks good on cluster 2.
Cluster 2, node 1:
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| db1 |
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.00 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.52
Master_User: sst
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 906
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 816
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 906
Relay_Log_Space: 1117
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 21-1-4,22-5-1
1 row in set (0.00 sec)
However, now let's try dropping the database on one of the nodes in cluster 2:
Cluster 2, node 2:
MariaDB [(none)]> DROP DATABASE db1; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | test | +--------------------+ 4 rows in set (0.00 sec)
Everything looks good on cluster 1.
Cluster 1, node 1:
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.55
Master_User: sst
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 787
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 814
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 787
Relay_Log_Space: 1115
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 4
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 21-1-4,22-5-2
1 row in set (0.00 sec)
However, it seems to have broken the replication back to cluster 2:
Cluster 2, node 1:
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.52
Master_User: sst
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 1023
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 816
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1008
Last_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
Skip_Counter: 0
Exec_Master_Log_Pos: 906
Relay_Log_Space: 1234
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1008
Last_SQL_Error: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 21-1-4,22-5-2
1 row in set (0.00 sec)
The "drop database" command appears to have been executed twice: once via Galera, and once via GTID replication.
The error log for Cluster 2, node 1 shows:
150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Gtid 22-5-2, Internal MariaDB error code: 1008
150413 17:18:32 [Warning] Slave: Can't drop database 'db1'; database doesn't exist Error_code: 1008
150413 17:18:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 906; GTID position '21-1-4,22-5-1'
The error logs for other nodes in Cluster 2 show:
150413 17:18:32 [ERROR] Slave SQL: Error 'Can't drop database 'db1'; database doesn't exist' on query. Default database: 'db1'. Query: 'DROP DATABASE db1', Internal MariaDB error code: 1008
150413 17:18:32 [Warning] WSREP: RBR event 1 Query apply warning: 1, 6
150413 17:18:32 [Warning] WSREP: Ignoring error for TO isolated action: source: f371bbe5-e214-11e4-acfd-7a10f557098b version: 3 local: 0 state: APPLYING flags: 65 conn_id: 13 trx_id: -1 seqnos (l: 8, g: 6, s: 5, d: 5, ts: 9034574114945)
The statement only appears in the binary log of Cluster 1, node 1 once:
# at 551 #150413 17:15:29 server id 1 end_log_pos 589 GTID 21-1-4 /*!100001 SET @@session.gtid_domain_id=21*//*!*/; /*!100001 SET @@session.server_id=1*//*!*/; /*!100001 SET @@session.gtid_seq_no=4*//*!*/; # at 589 #150413 17:15:29 server id 1 end_log_pos 670 Query thread_id=9 exec_time=0 error_code=0 SET TIMESTAMP=1428959729/*!*/; CREATE DATABASE db1 /*!*/; # at 670 #150413 17:18:32 server id 5 end_log_pos 708 GTID 22-5-2 /*!100001 SET @@session.gtid_domain_id=22*//*!*/; /*!100001 SET @@session.server_id=5*//*!*/; /*!100001 SET @@session.gtid_seq_no=2*//*!*/; # at 708 #150413 17:18:32 server id 5 end_log_pos 787 Query thread_id=8 exec_time=0 error_code=0 SET TIMESTAMP=1428959912/*!*/; DROP DATABASE db1 /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
Gliffy Diagrams
Attachments
Activity
- All
- Comments
- Work Log
- History
- Activity
- Transitions
The duplicate operation was properly ignored when server_id was set to the same value for all nodes in a particular cluster (i.e. in cluster 1, server_id is set to 1, in cluster 2, server_id is set to 2).
Is that the only supported method to use GTID replication with Galera?