Monday, June 29, 2009

mysql UC 2009 Talk

Scribe is a bit buggy with displaying this presentation:

Scaling a Widget Company

Thursday, June 11, 2009

PHP mysql connect and using APC to recover

So, when you connect on the fly to a database your subject to a variety of issues, like when the db is not available and when the db does not have a route.

One of the main reasons why a dev may want to connect on the fly is because they have too many front ends to hold a persistent connection on the backends. Since mySQL does not use libevent, holding open threads to mySQL is much more costly. Threads == Memory.

But, that's here nor there. The main purpose of this post is to talk about how to recover from failed connections that block apache threads.

Common Failures:
No route to Host
Flapping NIC
Locked Tables
Recovering from a Crash
more of the same.


My Environment:
I have a bunch of webservers (200+) that all have 300 possible threads (60000 possible connections to a single DB) behind a load balancer that uses the LB least connections protocol to distribute load across all 200+ webservers.

Since PHP is used in a stateless mode (no guarentee that the same user will hit the same server), I have to have some way of telling all other apache procs for said box that a server is down. I use stateless on the fly connections, so each apache proc will test the connection.

Here in lies my most common problem. If a box in the backend dies, all apache threads will block for a predefined time-out.

In /etc/php.ini (it could be in /etc/php.d/mysql.ini) I set

mysql.connect_timeout = 5 // the default is 60


In my common_db class when connecting to a database, the connect routine returns a database handle object

if (PEAR::isError($dbh) && $delta_to_connect >= 5){

#
# mark ip as dead for 15 min
#

apc_store($ip, array('DEAD'), 900);
return false;
}

return $dbh;




Now PRIOR to calling the database connect code I check to see if the IP is up.

$status = apc_fetch($ip);
if ($status[0] === 'DEAD'){
return false;
}

... do connect ...


But, there could be a variety of issues that can call false positives, like network flaps, someone blocking the db for some time etc. So, I'll allow one request .1% of the time bypass the status check to try again.


if ($status[0] === 'DEAD' && (mt_rand(0, 1000) != 1)){
return false;
}



But, if your app can't connect to the DB aren't you down anyway?

Yes, although I don't have to restart all the httpds.
Also each DB has a redundant pair to when returning false, the app code will try the other set of servers.

I have a variety of methods to deal with these scenarios but this is the quickest to implement.