Thursday, March 15, 2007

Looking for DBA's

Send me your resume, and I'll hook you up with a job at Yahoo!

Specifically I'm looking for a mySQL DBA, with 64-bit Linux Experience.
If you have more sys-admin / development skills then actually DBA skills no worries, if your good I'll give you a crash course in how to tune / improve uses of mysql.


Send your inquires to

mysqldba1@yahoo.com

or comment on this blog.

Tuesday, March 13, 2007

So, why use mySQL 5.0

Every database that houses user generated content is a master in at least 1 pair within my environment. To make sure that id's don't collide we engineered around the auto-increment functionality or lack there of in mySQL-4.1. For the most part the auto-increment is externally generated.



mySQL 5.0 introduces a cool feature that increments auto increments different per mysql instance.

In my my.cnf file I define this:

auto_increment_increment = 5
auto_increment_offset = 1


This means that increments will be produced by 5 so for example starting from 1
1,6,11,16, etc.

On another server I define

auto_increment_increment = 5
auto_increment_offset = 2

This functionality is cool, but there are limitations:

I can only have 5 masters to generate this sequence before I need to do some tricks to generate new auto-increments that do not collide, this will require a restart of the server or servers involved in this increment.


So, to get around this and not replicate ids all over the place, I decided to get a few old servers that are not good enough to be production servers, but is good enough to handle very specific traffic, like id generation.


As a result I made 2 servers. Then I changed the application to get a ticket or id before inserting into the database.

Getting a ticket

CREATE TABLE `TicketsGeneric` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`stub` char(1) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `stub` (`stub`)
) ENGINE=InnoDB


REPLACE INTO TicketsGeneric (stub) VALUES ('a');
SELECT LAST_INSERT_ID();



Generate the ticket:

With 1 server that can do more then a few thousand queries per second at a high currency of at least 10 threads-this is more then enough for the scope of the project. Two are used for redundancy and up to 5 servers for growth / redundancy.


Now use the ticket as the id for the row for user generated content.



But what about preserving sequence?

Sequence is used for ranges and order bys that can easily be replaced with performing a range or a sort on the time the row was added to the database and possibly having a subsort on ticket to get around same second adds. In nearly all cases this will work, especially when 98% of the queries involve more then 1 column, the other 2% is used to reference the id from different tables, so overall this is a good solution for me :)

Friday, March 09, 2007

Putting mySQL 5.0.33 into production

So, I'm playing with 5.0.33 and there are a few gotcha:

If you use innodb and like to backup your databases using ibbackup, make sure your license is for ibackup-3.0. 2.0 will report that the iblog files are corrupt.


Otherwise it's a straight forward upgrade for me, so far.

I have a mysql 4.0 database that was about 200GB in size, most of that data was removed yet the ibdata file is still 200GB. So, what I did was

mysqldump --single-transaction -uroot -p --all-databases --master-data=1 > all_databases.sql


once backed up I trashed the old datafile and rebuilt it into an empty 5.0 datafile.

Next I added the option innodb_file_per_table so, I can shrink data on a table basis in the future.

mysql -uroot < all_databases.sql

wala done.


In a few hours I turned that 200 GB datafile into 27 GB data, and reclaimed a bunch of resources.



So, why upgrade? Well I wanted to try the innodb scalability patch in production, and 4.1.23 is not in bitkeeper yet (last time I checked). Also there is a cool feature in mysql 5, that turns simple SQL-OR statements internally into UNIONS to get the most out of indexes. I'd like to see how this will help me from day to day uses.