I have finalized the design for Federating Connections and part of the design has been implemented so far with amazing results. Overnight the dependency on replication has been reduced. Feed updates no longer are lagging and query load doubled without the need for new hardware.
Social connections (not mySQL connections) at Flickr in particular are directly responsible for permission levels. This allows members to see into a another members photostream. This global requirement means that every logged in page viewed on Flickr requires a database read if the page is not the members own page. So if the cluster is down, all access to photostreams default to the most restrictive state, i.e. public photos only.
As a result, the service needs to be extremely responsive on reads, since possibly every page view on Flickr could produce a realtime query on the contact cluster.
Next, the data has to be redundant and always available. This is very hard to do, when you have no spare servers, and only two servers to do this entire procedure-yet we did it. We recorded all photo permission change events, created the new method, backfilled and applied all change events to the new schema layout.
Finally the new design allows for more features and more requests to the system with the ability to spread data across N severs.
Two servers are all that is needed now, with the next phase spreading the data across more servers - with NO memcache or cache layer at all.