Я сделал следующее, чтобы восстановить неисправный узел из резервной копии и восстановить состояние кластера.
1) Ниже приведено состояние кластера при сбое одного из узлов (NODE01).
MySQL NODE02:3306 ssl JS > var c=dba.getCluster()
MySQL NODE02:3306 ssl JS > c.status()
{
"clusterName": "QACluster",
"defaultReplicaSet": {
"name": "default",
"primary": "NODE03:3306",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"topology": {
"NODE02:3306": {
"address": "NODE02:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"NODE03:3306": {
"address": "NODE03:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"NODE01:3306": {
"address": "NODE01:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
}
}
},
"groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}
2) Возьмите mysqldump с главного узла (здоровый узел), используя следующую команду.
[root@NODE03 db_backup]# mysqldump --all-databases --add-drop-database --single-transaction --triggers --routines --port=mysql_port --user=root -p > /db_backup/mysql_dump_03062019.sql
Enter password:
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
3) Выполните приведенный ниже шаг, чтобы удалить неисправный узел из кластера.
MySQL NODE03:3306 ssl JS > var c=dba.getCluster()
MySQL NODE03:3306 ssl JS > c.rescan()
Rescanning the cluster...
Result of the rescanning operation:
{
"defaultReplicaSet": {
"name": "default",
"newlyDiscoveredInstances": [],
"unavailableInstances": [
{
"host": "NODE01:3306",
"label": "NODE01:3306",
"member_id": "e2aa897d-1828-11e9-85b3-00505692188c"
}
]
}
}
The instance 'NODE01:3306' is no longer part of the HA setup. It is either offline or left the HA group.
You can try to add it to the cluster again with the cluster.rejoinInstance('NODE01:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...
The instance 'NODE01:3306' was successfully removed from the cluster metadata.
MySQL NODE03:3306 ssl JS > c.status()
{
"clusterName": "QACluster",
"defaultReplicaSet": {
"name": "default",
"primary": "NODE03:3306",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures.",
"topology": {
"NODE02:3306": {
"address": "NODE02:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"NODE03:3306": {
"address": "NODE03:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}
4) Остановить репликацию группы, если она все еще работает на отказавшем узле.
mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (1.01 sec)
5) Сбросить "gtid_executed" на отказавшем узле.
mysql> show global variables like 'GTID_EXECUTED';
+---------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------------------------------------------------------+
| gtid_executed | 01f27b9c-182a-11e9-a199-00505692188c:1-14134172,
e2aa897d-1828-11e9-85b3-00505692188c:1-12 |
+---------------+--------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> reset master;
Query OK, 0 rows affected (0.02 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.02 sec)
mysql> show global variables like 'GTID_EXECUTED';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_executed | |
+---------------+-------+
1 row in set (0.00 sec)
6) Отключите «super_readonly_flag» на отказавшем узле.
mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
| 1 | 1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
mysql> SET GLOBAL super_read_only = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
| 1 | 0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
7) Восстановите mysqldump с главного на отказавший узел.
[root@E2LXQA1ALFDB01 db_backup]# mysql -uroot -p < mysql_dump_03062019.sql
8) После завершения восстановления включите«super_readonly_flag» на отказавшем узле.
mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
| 1 | 0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
mysql> SET GLOBAL super_read_only = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
| 1 | 1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)
9) Наконец добавьте отказавший узел обратно в кластер innodb.
MySQL NODE03:3306 ssl JS > c.addInstance('clusterAdmin@NODE01:3306');
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.
Adding instance to the cluster ...
Please provide the password for 'clusterAdmin@NODE01:3306': *******************
Save password for 'clusterAdmin@NODE01:3306'? [Y]es/[N]o/Ne[v]er (default No):
Validating instance at NODE01:3306...
This instance reports its own address as NODE01
WARNING: The following tables do not have a Primary Key or equivalent column:
ephesoft.dlf, report.correction_type, report.field_details_ag, report_archive.correction_type, report_archive.field_details_ag, report_archive.global_data_ag
Group Replication requires tables to use InnoDB and have a PRIMARY KEY or PRIMARY KEY Equivalent (non-null unique key). Tables that do not follow these requirements will be readable but not updateable when used with Group Replication. If your applications make updates (INSERT, UPDATE or DELETE) to these tables, ensure they use the InnoDB storage engine and have a PRIMARY KEY or PRIMARY KEY Equivalent.
Instance configuration is suitable.
WARNING: On instance 'NODE01:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
WARNING: On instance 'NODE02:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
WARNING: On instance 'NODE03:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
The instance 'clusterAdmin@NODE01:3306' was successfully added to the cluster.
MySQL NODE03:3306 ssl JS > c.status()
{
"clusterName": "QACluster",
"defaultReplicaSet": {
"name": "default",
"primary": "NODE03:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"NODE01:3306": {
"address": "NODE01:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"NODE02:3306": {
"address": "NODE02:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"NODE03:3306": {
"address": "NODE03:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}