Friday, July 29, 2011

Automated Innodb Hot Backup to S3 from ec2 with a simple bash script, innobackup and jets3t

Backing up data is always necessary, especially for Disaster Recovery and Business Continuity Planning (BCP). One rule of thumb for me is let the computer do the work by automating repetitive tasks. Running things by hand over and over sucks, so if I do something more then once I typically automate the process. Backing up INNODB data is a good example of a solution that requires automation.

I always start off with what are the requirements and limitations when deciding to automate something. My requirement was simple. Every night at 2am backup the database and upload it to s3 without downtime. The limitation is to avoid downtime, I must keep in mind that the database cannot go down and the tables cannot be locked. Thus I use xtradbbackup provided by Percona.

The second part of the requirement is to upload the data to s3 automatically. This is not hard, but there is a limitation. Uploading files > 5GB is not possible, and typically anything over 200MB has an increased error rate on average. Good news is the guys that make JetS3t, a java based uploader tool, solved this for me.

JetS3t works on NIX and Windows, there is a shell and bat script for your OS flavor. It supports huge file uploads via the amazon multipart upload api, and can upload data in parallel.

Ideally I would stream a backup directly to S3 all in one step, yet I don't have the time to code that. Maybe in the future. So the next best thing is to do it in roughly two steps. Below is a BASH script that backups the mysql database, uploads the backup to S3.

#!/bin/bash

INNOBACKUP="/usr/bin/innobackupex"
INNOBACKUP_OPTIONS="--parallel=4 --user=backup --password=****"
BACKUPDIR="/sqldata/backups"

S3BUCKET="dbprodbackups"
JETS3="/usr/local/jets3t/bin/synchronize.sh UP $S3BUCKET $BACKUPDIR"

echo "Removing old local backups"
cd $BACKUPDIR
find . -type d -name "." -prune -o -type d -atime +3 -exec rm -rf {} \; -print

echo "Starting INNODBACKUP:"

echo "$INNOBACKUP $INNOBACKUP_OPTIONS $BACKUPDIR" 
$INNOBACKUP $INNOBACKUP_OPTIONS $BACKUPDIR 

echo "Sleeping for a 1 min 10 seconds"
sleep 70

S3_DIR_UPLOAD=`find . -maxdepth 1 -type d -cmin +1 -print |grep ./ |cut -d. -f2 |cut -d/ -f2|xargs`

echo "Dir to Synchronize are $S3_DIR_UPLOAD"

for dir in $S3_DIR_UPLOAD; do 
   echo "Executing $JETS3/$dir"
   $JETS3/$dir
done

Configuring JetS3t is simple, its a matter of modifying a properties file.

The jets3t directory (my case /usr/local/jets3t/) a directory config contains all the configs. In synchronize.properties define your accesskey and secretkey for your s3 account.


Line 8 says, synchronize the current folder and push it to s3 bucket dbprodbackups. JetS3t handles region automatically (Amazon tools do not).

Running the script you'll see something like this for the innodb backup

[02] Copying ./ShardLookup/MainLookup.ibd
to /sqldata/backups/2011-07-29_20-13-38/ShardLookup/MainLookup.ibd
>> log scanned up to (99395152227)
>> log scanned up to (99395249621)
>> log scanned up to (99395366299)
>> log scanned up to (99395373016)
>> log scanned up to (99395382850)


and something like this for the jets3t upload

...

N 2011-07-29_20-13-38/xtrabackup_binary
N 2011-07-29_20-13-38/xtrabackup_binlog_info
N 2011-07-29_20-13-38/xtrabackup_checkpoints
N 2011-07-29_20-13-38/xtrabackup_logfile
Large upload parts: 32/43 - 76% of 38.88 GB (15.89 MB/s - ETA: 8 1/2 minutes)

In about 1 hour I am able to backup 40GB and upload it to s3.

No comments: