---恢复内容开始---
1.检查MHA状态:
#masterha_check_status --conf=/etc/mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
状态为没有运行。
2.检查MHA主从复制状态
masterha_check_repl --conf=/etc/mha/app1.cnf
Wed Mar 28 11:38:22 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Mar 28 11:38:22 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Wed Mar 28 11:38:22 2018 - [info] Reading server configuration from /etc/mha/app1.cnf.. Wed Mar 28 11:38:22 2018 - [info] MHA::MasterMonitor version 0.57. Wed Mar 28 11:38:22 2018 - [error][/usr/local/share/perl5/MHA/Server.pm, ln935] SQL Thread is stopped(error) on 10.1.46.214(10.1.46.214:3306)! Errno:1032, Error:Could not execute Delete_rows event on table shop.t_ncl_agent_openid; Can't find record in 't_ncl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 878001439 Wed Mar 28 11:38:22 2018 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln703] Server 10.1.46.214(10.1.46.214:3306) is alive, but does not work as a slave! Wed Mar 28 11:38:22 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 329 Wed Mar 28 11:38:22 2018 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers. Wed Mar 28 11:38:22 2018 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK! |
3.主从复制检查:其中一个从库:
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000070 Read_Master_Log_Pos: 232233295 Relay_Log_File: weixintbdb02-relay-bin.000024 Relay_Log_Pos: 877993376 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032 Last_Error: Could not execute Delete_rows event on table shop.t_ncl; Can't find record in 't_ncl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 878001439 Skip_Counter: 0 Exec_Master_Log_Pos: 877993203 Relay_Log_Space: 64262720902 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Delete_rows event on table shop.t_ncl_agent_openid; Can't find record in 't_ncl_agent_openid', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 878001439 Replicate_Ignore_Server_Ids: Master_Server_Id: 462131 Master_UUID: 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9 Master_Info_File: /data/mysql/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180327 23:55:13 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:9-123314222 Executed_Gtid_Set: 98ef4e90-1cff-11e8-ab2a-0425c58fa8d7:1-8, 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652530 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) |
检查另一个从库复制状态:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
两个线程均正常。
4.解析日志文件
mysqlbinlog -v --base64-output=decode-rows weixintbdb02-relay-bin.000024 > /tmp/weixintbdb02.sql
SET @@SESSION.GTID_NEXT= '9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:13652530'/*!*/; # at 877992833#180320 13:23:38 server id 462131 end_log_pos 877992738 CRC32 0x1673bf15 Query thread_id=3763088 exec_time=0 error_code=0SET TIMESTAMP=1521523418/*!*/;BEGIN/*!*/;# at 877992911#180320 13:23:38 server id 462131 end_log_pos 877992834 CRC32 0x9f7675ce Table_map: `shop`.`t_pai_log_apps` mapped to number 265# at 877993007#180320 13:23:38 server id 462131 end_log_pos 877993172 CRC32 0xfe923d4e Write_rows: table id 265 flags: STMT_END_F### INSERT INTO `shop`.`t_pai_log_apps`### SET### @1='331354b1ae8c40509868342c1170b91f'### @2='/ncl/agentWD/checkInfoAndBind'### @3='妫?ュ井搴.??″.杈..?.俊?.苟缁..涓..?.### @4='/weshop'### @5=0### @6=NULL### @7='2018:03:20'### @8='2018:03:20'### @9=724### @10='2018:03:20'### @11='10.5.96.29'### @12=NULL### @13='checkInfoAndBind.step3 com.geong.ncl.agentWD.step.AgentWDInsertStep:0 720ms tojson com.geong.pai.web.opstep.TransferPage:0 1ms '### @14='0000'### @15=NULL# at 877993345#180320 13:23:38 server id 462131 end_log_pos 877993203 CRC32 0x626d11b7 Xid = 48696609COMMIT/*!*/;# at 877993376 ---复制应用位置#180320 13:30:50 server id 462131 end_log_pos 877993268 CRC32 0x8c10dea2 GTID last_committed=1536936 sequence_number=1536937 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;SET @@SESSION.GTID_NEXT= '9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:13652531'/*!*/;# at 877993441#180320 13:30:50 server id 462131 end_log_pos 877993344 CRC32 0x89c418bf Query thread_id=3763092 exec_time=0 error_code=0SET TIMESTAMP=1521523850/*!*/;/*!\C utf8mb4 *//*!*/;SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=33/*!*/;BEGIN/*!*/;# at 877993517#180320 13:30:50 server id 462131 end_log_pos 877993443 CRC32 0x3e33c686 Table_map: `shop`.`shop.t_ncl` mapped to number 290# at 877993616#180320 13:30:50 server id 462131 end_log_pos 878001439 CRC32 0x2cbaa777 Delete_rows: table id 290........# at 879424748#180320 13:30:50 server id 462131 end_log_pos 879432661 CRC32 0x04ca6c87 Delete_rows: table id 290# at 879432834#180320 13:30:50 server id 462131 end_log_pos 879433847 CRC32 0xb9e88af6 Delete_rows: table id 290 flags: STMT_END_F### DELETE FROM `shop`.`shop.t_ncl`### WHERE### @1=47### @2='02243518'### @3='o5lHPw7U0FGEznvO6SawUAFtQjA4'### @4=NULL### @5=NULL### @6=NULL### @7=NULL### @8=NULL### @9=NULL### @10=NULL### @11=NULL### @12=NULL### @13=0........### DELETE FROM `shop`.`shop.t_ncl`### WHERE...skipping...### @1=6416### @2='02275064'### @3='8886139b-96b9-44be-95dd-1257ae7688aa'### @4='13784976686'### @5=NULL### @6=NULL### @7=NULL### @8='http://thirdwx.qlogo.cn/mmopen/bj9JGugn6Uf7P15BxMQAz0q7V8v38ibric2acnsnMiaF3SibSibWPYoGEgBDObRdMImFtvfqFIv8wibvOibJJnYp2tYn2SOlG15uetV/132'### @9=NULL### @10=NULL### @11=NULL### @12='2018-03-20 13:15:45'### @13=0# at 879434020#180320 13:30:50 server id 462131 end_log_pos 879433878 CRC32 0x6b705e04 Xid = 48696956COMMIT/*!*/;# at 879434051#180320 13:34:25 server id 462131 end_log_pos 879433943 CRC32 0x2a4685b6 GTID last_committed=1536937 sequence_number=1536938 rbr_only=yes/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;SET @@SESSION.GTID_NEXT= '9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:13652532'/*!*/; |
主从复制显示sql线程已经执行完gtid号为9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652530,日志在GTID号为9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652531均是对shop`.`shop.t_ncl表的删除操作,查看从库中有这个表,但是没有数据,说明从库可能手动删除过,或者根本就没有数据。
解决跳过GTID号为9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652531的操作。
解决办法:
mysql> stop slave;Query OK, 0 rows affected (0.00 sec)mysql> set gtid_next='9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:13652531';
Query OK, 0 rows affected (0.00 sec)mysql> show slave status\G;
*************************** 1. row *************************** Slave_IO_State: Master_Host: 10.1.46.213 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000070 Read_Master_Log_Pos: 232703867 Relay_Log_File: weixintbdb02-relay-bin.000024 Relay_Log_Pos: 877993376 Relay_Master_Log_File: mysql-bin.000010 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032 Last_Error: Could not execute Delete_rows event on table shop.t_ncl; Can't find record in 't_ncl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 878001439 Skip_Counter: 0 Exec_Master_Log_Pos: 877993203 Relay_Log_Space: 64263191474 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1032 Last_SQL_Error: Could not execute Delete_rows event on table shop.t_ncl; Can't find record in 't_ncl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000010, end_log_pos 878001439 Replicate_Ignore_Server_Ids: Master_Server_Id: 462131 Master_UUID: 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9 Master_Info_File: /data/mysql/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180327 23:55:13 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:9-123314722 Executed_Gtid_Set: 98ef4e90-1cff-11e8-ab2a-0425c58fa8d7:1-8,9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652530 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)mysql> begin;
Query OK, 0 rows affected (0.00 sec)mysql> commit;
Query OK, 0 rows affected (10.00 sec) mysql> show master status\G;*************************** 1. row *************************** File: mysql-bin.000012 Position: 436 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 98ef4e90-1cff-11e8-ab2a-0425c58fa8d7:1-8,9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-13652531mysql> set gtid_next='automatic';
Query OK, 0 rows affected (0.00 sec) mysql> start slave;Query OK, 0 rows affected (0.00 sec)通过执行一个空事物跳过31的GTID号,在重启slave。
5.后续操作
因为mha需要在mysql主从同步复制正常的时候才能正常运行。现在从库10.1.46.214还在追日志,等日志追平后,将mha重新启动。
检查从库同步:
mysql> show slave status\G;
查看点:
同步状态:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
复制同步情况:
Retrieved_Gtid_Set: 9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:9-123315286
Executed_Gtid_Set: 45cb77c3-1cfa-11e8-a55f-0425c555a5f3:1-3,
98ef4e90-1cff-11e8-ab2a-0425c58fa8d7:1-3,
9ab25963-1cf8-11e8-b9b9-0425c58fa8e9:1-123315286
Retrieved_Gtid_Set 提取的最大GTID号和Executed_Gtid_Set执行的最大GTID相同,则同步完成。
启动MHA:
nohup masterha_manager --conf=/etc/mha/app1.cnf --ignore_last_failover > /etc/mha/app1/manager.log < /dev/null 2>&1 &