Friday, June 25, 2010

dsh and TABLE CHECKSUM

So running through some various tasks, I'm finally on the section of work where I can resurrect a script that finds inconsistent data between master-master pairs.

Let's get a quick summary to find our problems to target the script at.


dsh -ef 20 -w dbfacebook9b,dbfacebook9a -s ~/bin/execute_query.sh "FacebookShard \"CHECKSUM TABLE RollingUserLog\""
dbfacebook9b: *************************** 1. row ***************************
dbfacebook9b: Table: FacebookShard.RollingUserLog
dbfacebook9b: Checksum: 538386033
dbfacebook9a: *************************** 1. row ***************************
dbfacebook9a: Table: FacebookShard.RollingUserLog
dbfacebook9a: Checksum: 538386033


not bad.

CHECKSUM TABLE uses a ACCUM algorithm to determine the table checksum. This gives a good fast snapshot in a shared lock mode so writes do not stop while doing a checksum (this is in the context of INNODB). RollingUserLog gets nearly 700 writes per sec per box (No alerts).

But what if there is a problem

# dsh -ef 20 -w dbfacebook38b,dbfacebook38a -s ~/bin/execute_query.sh "FacebookShard \"CHECKSUM TABLE SimulatedProblem\""
dbfacebook38b: *************************** 1. row ***************************
dbfacebook38b: Table: FacebookShard.SimulatedProblem
dbfacebook38b: Checksum: 660032421
dbfacebook38a: *************************** 1. row ***************************
dbfacebook38a: Table: FacebookShard.SimulatedProblem
dbfacebook38a: Checksum: 2533654621


Well let's see if there is a row count mismatch


# dsh -ef 20 -w dbfacebook38b,dbfacebook38a -s ~/bin/execute_query.sh "FacebookShard \"SELECT COUNT(*) FROM SimulatedProblem\""
dbfacebook38b: *************************** 1. row ***************************
dbfacebook38b: COUNT(*): 64358
dbfacebook38a: *************************** 1. row ***************************
dbfacebook38a: COUNT(*): 64358



Ah so the data is inconsistent, we are not missing rows but 1 or more rows have different values. Now time to find it.

To resurrect my script now that I know what table is messed up:

Algorithm is this:

open connection to both servers - fork
compare the data by scanning the table via a index walk
crc32/md5/hash the rows
compare columns
if (dbfacebook38b.hash != dbfacebook38a.hash)
mark row and record primary key to track position

print report

but wait why polish up my script when I can use a formal one that does the job great!

@see

mk-table-checksum
mk-table-sync

# fixes the issue

mk-table-sync h=dbfacebook38a,u=$USER,p=$PASS,D=FacebookShard,t=SimulatedProblem h=dbfacebook38b --execute --no-check-slave


maatkit is awesome. Nearly every problem that I run into and built a script for has been formalized. I'm going to muster up some time and contribute to this toolkit like add my binary log rotate which makes sure that the slaves are caught up to the binary file that is being purged.

Monday, June 07, 2010

Review of High Availability MySQL Cookbook by Packt Publishing


A few months ago, I reviewed MySQL Admin Cookbook. Today I am reviewing High Availability MySQL Cookbook from Packt Publishing by Alex Davies. Overall, I found the book to contain some good hidden Gems.

The book is a mixture of MySQL Cluster (NDB), Replication schemes, some performance tuning, some minor kernel tweaking, and some more exotic approaches to common High Availability problems. Overall, I found this book very informative and a good read.
Now the specifics, the book starts out on NDB and stays focused on this fact for about 60% of the book. The next 20% is on mySQL replication then about 10% of the book is on tweaking kernel, mysql, network settings to get the most out of the system. The last 10% is a mixture of uses of exotic systems such as GFS, Conga, ISCSI and how to use these shared storage techs with mySQL.

High Availability MySQL Cookbook, is a good read, and jogged my memory on NDB (since I do not use it on a day-to-day basis). If you are looking for a good reference on how to get an HA system up then this is a good book for you. If you are looking for why to use NDB over mySQL multi-master replication, this is not a book for you. The “why-to-use-this-over-that” is not the scope of the book. The drawbacks and concerns of what technology to use are not a focus of this book, and should not be. It is assumed that the reader knows which direction to go into for the most part or gives the reader enough information to set up and environment to see which way to go.

In conclusion, I like this book. I think it is rather concise, and right to the point, which either gets you started into building HA systems or gives you a good reference for an existing HA environment.