One solution is to code some logic into the application to switch SQL tables based on the create date of the row and map it to a table. This works but its not very clean.
Another solution is to walk the table in question and delete the old rows with DELETE statements. This is not good. The reason is DELETE IS VERY slow. In fact, the rule of thumb is this. You can do 10000s of SELECTS per seconds 10000s of Updates per second 1000s of Inserts per second, 10s of deletes a second. (I am hand waving and being general here). Additionally for INNODB (from this point just assume I am putting everything in the context of using INNODB) DELETES undo references are stored in the master ibdata file for innodb growing said data file unbounded over time. Ever notice that it grows even with innodb_file_per_table on? That's from deletes actions and is useless data. Additionally the only way to shrink that file is to do a full export to text and import after blowing away the master ibdata file(s).
Another solution is a mySQL partition table, think of it as a layer that sits in-front of the storage engine and relies heavily on the optimizer. The partition table maps statements based on the partition setup to the correct underlying tables. So, if you have 12 partitions for a table called FacebookInviteHistory, there will be 12 STORAGE ENGINE TABLES for the partition table FacebookInviteHistory in the format of
FacebookInviteHistory#P#<PartitionName>.ibd
Where FacebookInviteHistory is a pointer table in the format of
FacebookInviteHistory.par
FacebookInviteHistory.frm
The benefits of using partition table is the following:
- Underlying tables can be assigned to specific disk media.
- Aggregate functions such as sum and count can easily be parallelized across all partitions providing very quick access to results.
- Data that loses it usefulness can be easily removed by dropping the partition containing only that data.
- Finally queries can be greatly optimized by the fact that data satisfying a given where clause can be stored only on 1 or more partitions which automatically excludes looking at other partitions-this is called pruning.
This being said it still looks like a good feature to use but I have some questions which Google could not answer for me.
Question: Does adding or dropping partitions lock other partitions?
Short Answer: No
Long Answer:
For this table:
CREATE TABLE PartitionTest ( senderId bigint(20) unsigned NOT NULL, recipientId bigint(20) unsigned NOT NULL, createDate datetime NOT NULL DEFAULT 0, PRIMARY KEY (senderId, recipientId, createDate) ) PARTITION BY RANGE ( TO_DAYS(createDate) ) ( PARTITION Jan2012 VALUES LESS THAN (TO_DAYS('2012-02-01')), PARTITION Feb2012 VALUES LESS THAN (TO_DAYS('2012-03-01')), PARTITION Mar2012 VALUES LESS THAN (TO_DAYS('2012-04-01')), PARTITION Apr2012 VALUES LESS THAN (TO_DAYS('2012-05-01')), PARTITION May2012 VALUES LESS THAN (TO_DAYS('2012-06-01')), PARTITION Jun2012 VALUES LESS THAN (TO_DAYS('2012-07-01')), PARTITION Jul2012 VALUES LESS THAN (TO_DAYS('2012-08-01')) );
I wrote a benchmarking tool that simulates my type of concurrency writing at double the rate. Think of this tool as mysqlslap but specific for my table structures and load. Next I ran an administration command to add a partition drop it and add it again.
mysql> ALTER TABLE PartitionTest ADD PARTITION (PARTITION Aug2012 VALUES LESS THAN (TO_DAYS('2012-09-01'))); Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE PartitionTest DROP PARTITION Aug2012; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE PartitionTest ADD PARTITION (PARTITION Aug2012 VALUES LESS THAN (TO_DAYS('2012-09-01'))); Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0
QPS from the tool did not drop.
Question: Can you create a circular queue of Partition tables?
Short Answer: No in 5.1, Yes in 5.6 sort of.
Long Answer:
Lets imagine you wanted to create 12 partitions with a RANGE ON MONTH(createDate) each month would map to one of the 12 partitions. The goal is to drop data that is over two months old, so on June 1st-Jan to April's data can be dropped. This does not work in 5.1. In 5.1 you need to add ranges that is greater then the previous range. In 5.6 you can truncate a partition. But in both mysql versions pruning just does not work on a range on MONTH of createDate so its just not an efficient partition for reads. For date and datetime fields TO_DAYS and YEAR, are pruning friendly. In 5.6 TO_SECONDS is pruning friendly. For some good reading check out Mikael's blog on 5.1 and 5.6 date columns and pruning. Try not to use timestamp columns when partitioning data. Its been very buggy in the past and does not work well with pruning. Even though you may save 4 bytes per row with timestamp the work is just not in place to partition TIMESTAMP COLUMNS.
Question: Can you force reading from a partition?
Short Answer: Yes with some work in 5.1 but in 5.6 the feature is better.
Long Answer: If you know how the data is partitioned you can construct SQL to read the entire partition in 5.1. In 5.6 there is a command to read from a partition.
Question: For a partition of months why not use a LIST Type Partition? *From a comment below but made pretty*
Short Answer: datetime doesn't support MONTH for pruning.
Long Answer: Pruning on Lists for datetime is only optimized for TO_DAYS, YEAR according to the documentation. Every partition would be queried.
DROP TABLE IF EXISTS PartitionTest; CREATE TABLE PartitionTest ( senderId bigint(20) unsigned NOT NULL, recipientId bigint(20) unsigned NOT NULL, createDate datetime NOT NULL DEFAULT 0, PRIMARY KEY (senderId, recipientId, createDate) ) PARTITION BY LIST ( MONTH(createDate) ) ( PARTITION Jan VALUES IN (1), PARTITION Feb VALUES IN (2), PARTITION Mar VALUES IN (3), PARTITION Apr VALUES IN (4), PARTITION May VALUES IN (5), PARTITION Jun VALUES IN (6), PARTITION July VALUES IN (7) ); mysql> SELECT PARTITION_NAME, TABLE_ROWS, PARTITION_EXPRESSION, PARTITION_DESCRIPTION FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 'PartitionTest'; +----------------+------------+----------------------+-----------------------+ | PARTITION_NAME | TABLE_ROWS | PARTITION_EXPRESSION | PARTITION_DESCRIPTION | +----------------+------------+----------------------+-----------------------+ | Jan | 101 | MONTH(createDate) | 1 | | Feb | 101 | MONTH(createDate) | 2 | | Mar | 101 | MONTH(createDate) | 3 | | Apr | 0 | MONTH(createDate) | 4 | | May | 0 | MONTH(createDate) | 5 | | Jun | 0 | MONTH(createDate) | 6 | | July | 0 | MONTH(createDate) | 7 | +----------------+------------+----------------------+-----------------------+ 7 rows in set (0.01 sec) mysql> DESCRIBE PARTITIONS SELECT * FROM PartitionTest WHERE createDate > '2012-01-01' AND createDate < '2012-02-01'; +----+-------------+---------------+------------------------------+-------+---------------+---------+---------+------+------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+------------------------------+-------+---------------+---------+---------+------+------+--------------------------+ | 1 | SIMPLE | PartitionTest | Jan,Feb,Mar,Apr,May,Jun,July | index | NULL | PRIMARY | 24 | NULL | 307 | Using where; Using index | +----+-------------+---------------+------------------------------+-------+---------------+---------+---------+------+------+--------------------------+ 1 row in set (0.00 sec)In summary I am rolling out partition tables to over 1.6 TB of data across 10 shards. So far so good.
No comments:
Post a Comment