In a previous post I explained about BCP. I have just finished my latest and nearly final test, and all worked as expected. For about an hour certain front end servers where hitting a database shard in 1 datacenter while the rest of the front ends hit the same shard in another datacenter.
What makes this incredible is that now data from mySQL can be close to geo-graphic locations of the end user, without having to make changes to the front end application. So fail over is silent from a database perspective if an entire datacenter is down. Actions outside of the application are also replicating seamlessly and in order. Latency is high but the goal is not to have a WWW in one datacenter talk to a shard in another datacenter. So, latency is in effect not an issue especially if one uses Akami-DNS to geo-graphically loadbalance your user base.
This is a great simple solution that scales, and it only took 1 week to implement from the ground up.
Other solutions that I have seen have a Proxy layer infront of the actual database that write data to one datacenter and synchronously writes data to another datacenter. So, your entire transaction time is the SUM of the transaction, for each server plus the SUM of the latency to talk to another datacenter. Also if that "Proxy Layer" goes down, or the stunnel goes down the application goes down. This is not ideal for me.
The solution which I designed and was implemented+tweaked by our master java engineer: removes these layers and makes the data transfer independent of the application. So, if that layer dies the application does not die-it just gets restarted and catches up to the latest events.
I love when things work.