Tuesday, March 13, 2007

So, why use mySQL 5.0

Every database that houses user generated content is a master in at least 1 pair within my environment. To make sure that id's don't collide we engineered around the auto-increment functionality or lack there of in mySQL-4.1. For the most part the auto-increment is externally generated.

mySQL 5.0 introduces a cool feature that increments auto increments different per mysql instance.

In my my.cnf file I define this:

auto_increment_increment = 5
auto_increment_offset = 1

This means that increments will be produced by 5 so for example starting from 1
1,6,11,16, etc.

On another server I define

auto_increment_increment = 5
auto_increment_offset = 2

This functionality is cool, but there are limitations:

I can only have 5 masters to generate this sequence before I need to do some tricks to generate new auto-increments that do not collide, this will require a restart of the server or servers involved in this increment.

So, to get around this and not replicate ids all over the place, I decided to get a few old servers that are not good enough to be production servers, but is good enough to handle very specific traffic, like id generation.

As a result I made 2 servers. Then I changed the application to get a ticket or id before inserting into the database.

Getting a ticket
CREATE TABLE `TicketsGeneric` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)

-- It's very important to use MYISAM because table locks are NEEDED and you get better throughput due to having a single row which is being operated on.

REPLACE INTO TicketsGeneric (stub) VALUES ('a');

Generate the ticket:

With 1 server that can do more then a few thousand queries per second at a high currency of at least 10 threads-this is more then enough for the scope of the project. Two are used for redundancy and up to 5 servers for growth / redundancy.

Now use the ticket as the id for the row for user generated content.

But what about preserving sequence?

Sequence is used for ranges and order bys that can easily be replaced with performing a range or a sort on the time the row was added to the database and possibly having a subsort on ticket to get around same second adds. In nearly all cases this will work, especially when 98% of the queries involve more then 1 column, the other 2% is used to reference the id from different tables, so overall this is a good solution for me :)


Jonathan said...

Probably missing something, but why not use randomly generated numbers as primary keys?

Dathan said...

To ensure that the random number is unique a list must be kept to ensure that the random number is not generated again. This would require a table with 100s of millions of rows.

Incrementing a number requires a table with only 1 row.

Dathan said...

Also, in a failure scenario like servers died or user error caused a deletion of the table, it's easy to recover the last increment and get up in a matter of minutes, while the random number approach would be disastrous.

Anonymous said...



A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,av dvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX