Thursday, July 10, 2014

Manually Switch Slaves to new Masters in mySQL 5.6 (XTRADB 5.6)

I'm really excited about Fabric which was recently announced. Everything it does has been a variety of scripts for  me or manual tasks, but before I can integrate Fabric into my system I must know more about it. When dealing with live-data and moving servers around I still do things manually just because it makes me feel better to know that if data is lost, I was the cause for doing something dumb. Basically I need to know everything about Fabric including line by line execution until I will deploy it.


Here are my steps for switching and replacing a Shard Slave.

Imagine having a setup in the following Config.

Shard Server  10.0.30.123 - this is the master endpoint

The Global Shard which holds Friend Info to join against is 

10.0.1.1

10.0.30.123 --- replicates from ---> 10.0.1.1

Now the Shard Server
10.0.30.123 has 3 slaves, thus 10.0.30.123 is set up to log-slave-updates
The 3 slaves are 10.0.18.78, 10.0.22.76, 10.0.22.77 and I want to make 10.0.22.76 the new master for the said Shard with 10.0.22.77 as its slave. So, what I have is

3 slaves --- replicates from ---> 10.0.30.123 --- replicates from ---> 10.0.1.1

what I will end up with is

10.0.22.77 -- replicates from ---> 10.0.22.76 ---> 10.0.1.1

I am getting rid of 10.0.30.123 and 10.0.18.78


Here are the steps.

Tell 10.0.22.77 and 10.0.22.76 to SLAVE UNTIL the next binary log in 10.0.123.1
  • ssh to each box
  • STOP SLAVE (using mysql 5.6) on 10.0.22.7[6-7]
  • SHOW SLAVE STATUS\G -- get Master_Log_File : master-bin.000612
  • START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000613', MASTER_LOG_POS=4
Now what I did here was tell the slaves to replicate until the next bin log is reached
mysql> START SLAVE UNTIL MASTER_LOG_FILE='master-bin.000613', MASTER_LOG_POS=4;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                  |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1278 | It is recommended to use --skip-slave-start when doing step-by-step replication with START SLAVE UNTIL; otherwise, you will get problems if you get an unexpected slave's mysqld restart |
| Note  | 1753 | UNTIL condtion is not supported in multi-threaded slave mode. Slave is started in the sequential execution mode.                                                                         |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec) // notice the minor bug in the spelling :)

I also get a warning that says my multiple SQL threads are now a single one which is fine.


My next step is to ssh to 10.0.30.123

  • FLUSH LOGS - this tells 10.0.30.123 to rotate all log files including mysql-bin.000613
Now on the slaves I wait until they stop

Once both stop, on 10.0.22.76 I issue RESET MASTER // I don't care about what was replicated at this point and saved already in the binlogs - I've already verified that they are in-sync with CHECKSUM TABLE

On 10.0.22.77 I issue the command
 
STOP SLAVE; CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=4, MASTER_HOST='10.0.22.76'; START SLAVE;

if you get an error  

Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
stop mysql, remove auto.cnf in your $DATADIR (/var/lib/mysql)

On 10.0.22.76 I issue 

START SLAVE


Now I wait until the SLAVE catches up to the MASTER 10.0.30.123 (remember this works because of log-slave-update)


Next in my dbconfig.php file I change all references to 10.0.30.123 to 10.0.22.76

Verify everything is in sync (USE CHECKSUM TABLE ACROSS TABLES/SERVERS ) and push out the new config

After the push Make sure to restart all daemons or queue workers, they may cache the database config

Now do this all over again to make

10.0.22.76 replicate from 10.0.1.1


In conclusion, this is just to manual and screams for automation.  Soon it will be with Fabric which manages this process once I get around to rolling that out.













2 comments:

james said...

oracle offers some cluster solution to make such setup easier right?

Dathan Pattishall said...

The community has many tools to do this automatically. MariaDB has the ultimate IMHO.