Friday, September 22, 2006

Master Master Replication Ring Recovery

In an Active Active Master Master setup there is a special procedure to do when recovering a down server.

Assumptions:

  • Your application can handle 1 of the servers being down.

  • You’re a good DBA so you take hot backups at least once a night.

  • You never PURGE MASTER LOGS to a point where a backup can't recover.

  • I'm assuming that you already know how to set up a master master config. If not comment and I will post how to, and things to watch out for.



Recover from backup, lets assume your using ibbackup


%> ibbackup --apply-log --use-memory 2000 my.cnf.restore


my.cnf.restore contains the innodb definitions and the location of the backup files.

Once restored INNODB will actually print out the last slave position it was at.

Start the mySQL instance

Perform a change master to, to that last slave position.
VERY IMPORTANT DO NOT and I REPEAT DO NOT set the Active master's replication on the 1st position of the recovered box.

Why?

If you’re using an Active Active, Master Master set up then that other box will log slave updates to its binary log and send all events back to the master that is still up. Additionally mySQL will not append the server-id of the last master to process the event, so there is a way to produce infinite loops of SQL statements.

Below is how to do so.


#060922 9:29:31 server id 14221485 log_pos 263789 Query thread_id=101364349 exec_time=0 error_code=0


Above is a dump of some header information that is put into a binary log tracking the server id of the original event. Now let’s say the server with the server id above died, and you brought up a new server with a new server id. If this event is put into a replication ring, prior to all server-id 14221485 finishing on the new Master, then ALL these events are sent back to the master that did not go down, the active Master. Then the active master sends this same event with the same server-id of the original event back to the passive master producing an infinite loop.

Really mySQL should append all server ids that processed the event to this header to prevent loops. But, this does not exist so... just follow the procedure below and you don't have to worry about it.
The procedure

  1. Recover from backup

  2. Turn the recovered box into a slave of the master that is still up

  3. Let it catch up

  4. Once catched up on the passive active master, FLUSH MASTER LOGS, then PURGE MASTER LOGS TO the new file

  5. On the active master CHANGE MASTER TO the new MASTER HOST, and the NEW MASTER Log, and the NEW MASTER Position

  6. Perform an event that will replicate on the active master, make sure the event took and is logged to the passive masters binary log.

  7. Perform an event that will replicate on the passive master, make sure the event took and is logged in the active masters binary log.

  8. Put the new MASTER in production.

1 comment:

Anonymous said...

Hi,
Please post the steps for doing a Master-Master replication. And the precaution to be taken so that a recovery can be done, when one of the DB fails critically.

PS: The link is broken

http://www.testingtheweb.com/archives/1001

Thanks