I always start off with what are the requirements and limitations when deciding to automate something. My requirement was simple. Every night at 2am backup the database and upload it to s3 without downtime. The limitation is to avoid downtime, I must keep in mind that the database cannot go down and the tables cannot be locked. Thus I use xtradbbackup provided by Percona.
The second part of the requirement is to upload the data to s3 automatically. This is not hard, but there is a limitation. Uploading files > 5GB is not possible, and typically anything over 200MB has an increased error rate on average. Good news is the guys that make JetS3t, a java based uploader tool, solved this for me.
JetS3t works on NIX and Windows, there is a shell and bat script for your OS flavor. It supports huge file uploads via the amazon multipart upload api, and can upload data in parallel.
Ideally I would stream a backup directly to S3 all in one step, yet I don't have the time to code that. Maybe in the future. So the next best thing is to do it in roughly two steps. Below is a BASH script that backups the mysql database, uploads the backup to S3.
#!/bin/bash INNOBACKUP="/usr/bin/innobackupex" INNOBACKUP_OPTIONS="--parallel=4 --user=backup --password=****" BACKUPDIR="/sqldata/backups" S3BUCKET="dbprodbackups" JETS3="/usr/local/jets3t/bin/synchronize.sh UP $S3BUCKET $BACKUPDIR" echo "Removing old local backups" cd $BACKUPDIR find . -type d -name "." -prune -o -type d -atime +3 -exec rm -rf {} \; -print echo "Starting INNODBACKUP:" echo "$INNOBACKUP $INNOBACKUP_OPTIONS $BACKUPDIR" $INNOBACKUP $INNOBACKUP_OPTIONS $BACKUPDIR echo "Sleeping for a 1 min 10 seconds" sleep 70 S3_DIR_UPLOAD=`find . -maxdepth 1 -type d -cmin +1 -print |grep ./ |cut -d. -f2 |cut -d/ -f2|xargs` echo "Dir to Synchronize are $S3_DIR_UPLOAD" for dir in $S3_DIR_UPLOAD; do echo "Executing $JETS3/$dir" $JETS3/$dir done
Configuring JetS3t is simple, its a matter of modifying a properties file.
The jets3t directory (my case /usr/local/jets3t/) a directory config contains all the configs. In synchronize.properties define your accesskey and secretkey for your s3 account.
Line 8 says, synchronize the current folder and push it to s3 bucket dbprodbackups. JetS3t handles region automatically (Amazon tools do not).
Running the script you'll see something like this for the innodb backup
[02] Copying ./ShardLookup/MainLookup.ibd
to /sqldata/backups/2011-07-29_20-13-38/ShardLookup/MainLookup.ibd
>> log scanned up to (99395152227)
>> log scanned up to (99395249621)
>> log scanned up to (99395366299)
>> log scanned up to (99395373016)
>> log scanned up to (99395382850)
and something like this for the jets3t upload
...
N 2011-07-29_20-13-38/xtrabackup_binary
N 2011-07-29_20-13-38/xtrabackup_binlog_info
N 2011-07-29_20-13-38/xtrabackup_checkpoints
N 2011-07-29_20-13-38/xtrabackup_logfile
Large upload parts: 32/43 - 76% of 38.88 GB (15.89 MB/s - ETA: 8 1/2 minutes)
In about 1 hour I am able to backup 40GB and upload it to s3.
No comments:
Post a Comment