Here are my steps for switching and replacing a Shard Slave.
Imagine having a setup in the following Config.
Shard Server 10.0.30.123 - this is the master endpoint
The Global Shard which holds Friend Info to join against is
10.0.1.1
10.0.30.123 --- replicates from ---> 10.0.1.1
Now the Shard Server
The 3 slaves are 10.0.18.78, 10.0.22.76, 10.0.22.77 and I want to make 10.0.22.76 the new master for the said Shard with 10.0.22.77 as its slave. So, what I have is
3 slaves --- replicates from ---> 10.0.30.123 --- replicates from ---> 10.0.1.1
what I will end up with is
10.0.22.77 -- replicates from ---> 10.0.22.76 ---> 10.0.1.1
I am getting rid of 10.0.30.123 and 10.0.18.78
Here are the steps.
Tell 10.0.22.77 and 10.0.22.76 to SLAVE UNTIL the next binary log in 10.0.123.1
- ssh to each box
- STOP SLAVE (using mysql 5.6) on 10.0.22.7[6-7]
- SHOW SLAVE STATUS\G -- get Master_Log_File : master-bin.000612
- START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000613', MASTER_LOG_POS=4
Now what I did here was tell the slaves to replicate until the next bin log is reached
mysql> START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000613', MASTER_LOG_POS=4;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1278 | It is recommended to use --skip-slave-start when doing step-by-step replication with START SLAVE UNTIL; otherwise, you will get problems if you get an unexpected slave's mysqld restart |
| Note | 1753 | UNTIL condtion is not supported in multi-threaded slave mode. Slave is started in the sequential execution mode. |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec) // notice the minor bug in the spelling :)
I also get a warning that says my multiple SQL threads are now a single one which is fine.
My next step is to ssh to 10.0.30.123
- FLUSH LOGS - this tells 10.0.30.123 to rotate all log files including mysql-bin.000613
Now on the slaves I wait until they stop
Once both stop, on 10.0.22.76 I issue RESET MASTER // I don't care about what was replicated at this point and saved already in the binlogs - I've already verified that they are in-sync with CHECKSUM TABLE
On 10.0.22.77 I issue the command
STOP SLAVE; CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=4, MASTER_HOST='10.0.22.76'; START SLAVE;
if you get an error
Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
stop mysql, remove auto.cnf in your $DATADIR (/var/lib/mysql)
On 10.0.22.76 I issue
START SLAVE
Now I wait until the SLAVE catches up to the MASTER 10.0.30.123 (remember this works because of log-slave-update)
Next in my dbconfig.php file I change all references to 10.0.30.123 to 10.0.22.76
Verify everything is in sync (USE CHECKSUM TABLE ACROSS TABLES/SERVERS ) and push out the new config
After the push Make sure to restart all daemons or queue workers, they may cache the database config
Now do this all over again to make
10.0.22.76 replicate from 10.0.1.1
In conclusion, this is just to manual and screams for automation. Soon it will be with Fabric which manages this process once I get around to rolling that out.
No comments:
Post a Comment