Tuesday, March 13, 2007

So, why use mySQL 5.0

Every database that houses user generated content is a master in at least 1 pair within my environment. To make sure that id's don't collide we engineered around the auto-increment functionality or lack there of in mySQL-4.1. For the most part the auto-increment is externally generated.



mySQL 5.0 introduces a cool feature that increments auto increments different per mysql instance.

In my my.cnf file I define this:

auto_increment_increment = 5
auto_increment_offset = 1


This means that increments will be produced by 5 so for example starting from 1
1,6,11,16, etc.

On another server I define

auto_increment_increment = 5
auto_increment_offset = 2

This functionality is cool, but there are limitations:

I can only have 5 masters to generate this sequence before I need to do some tricks to generate new auto-increments that do not collide, this will require a restart of the server or servers involved in this increment.


So, to get around this and not replicate ids all over the place, I decided to get a few old servers that are not good enough to be production servers, but is good enough to handle very specific traffic, like id generation.


As a result I made 2 servers. Then I changed the application to get a ticket or id before inserting into the database.

Getting a ticket
CREATE TABLE `TicketsGeneric` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MYISAM

-- It's very important to use MYISAM because table locks are NEEDED and you get better throughput due to having a single row which is being operated on.


REPLACE INTO TicketsGeneric (stub) VALUES ('a');
SELECT LAST_INSERT_ID();


Generate the ticket:

With 1 server that can do more then a few thousand queries per second at a high currency of at least 10 threads-this is more then enough for the scope of the project. Two are used for redundancy and up to 5 servers for growth / redundancy.


Now use the ticket as the id for the row for user generated content.



But what about preserving sequence?

Sequence is used for ranges and order bys that can easily be replaced with performing a range or a sort on the time the row was added to the database and possibly having a subsort on ticket to get around same second adds. In nearly all cases this will work, especially when 98% of the queries involve more then 1 column, the other 2% is used to reference the id from different tables, so overall this is a good solution for me :)

3 comments:

Anonymous said...

Probably missing something, but why not use randomly generated numbers as primary keys?

Dathan Pattishall said...

To ensure that the random number is unique a list must be kept to ensure that the random number is not generated again. This would require a table with 100s of millions of rows.

Incrementing a number requires a table with only 1 row.

Dathan Pattishall said...

Also, in a failure scenario like servers died or user error caused a deletion of the table, it's easy to recover the last increment and get up in a matter of minutes, while the random number approach would be disastrous.