Wednesday, July 30, 2008

What should I use MYISAM or INNODB?

I get this question allot, and my response is "it depends", people rarely like this general response. To pick a storage engine, the user must understand the gains and losses of using the storage engine, basically understand what INNODB and MYISAM do, and how they behave under different load patterns.

When picking a storage engine I go through a checklist-below is a quick list to get an idea of the thought process.

Do you require transactions?
If yes use INNODB else you may still want to use INNODB?


Are you doing a lot of big queries that Scan 20-30% of the rows?

If yes use MYISAM. It's better at doing large queries where the query requires a full table lock. INNODB will lock each row as it scans through it which hurts query throughput.


Are you building an app to store 1 row and access said row really fast and at a high concurrency?

See Tickets explanation of the example of an application that does this. Innodb hits a lock bottle neck when operating on the same row from many different threads, so it may not be advisable to use INNODB.


Are you building an app that stores a lot of blob data?

Now this is a grey area. With INNODB native zlib compression in 5.1+ INNODB may be a good choice over myISAM, while myISAM historically uses less diskpace then its INNODB counter-part. So, don't be afraid to create a single INNODB table and 100s of MYISAM tables that hold the blob data for INNODB.

Why do this? Because a single table in MYISAM will likely lock slowing down blob insertion / removal.


Do you require a lot of reads and writes where the ratio is not 90% reads 20% writes but more like 60/40?

Use INNODB over myISAM. If you don't believe me take a look at

show global status like 'table_locks_waited';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| Table_locks_waited | 82721 |
+--------------------+-------+


These are generic common questions that personally help me find out whats best to use. In reality to pick the correct storage engine, you should experiment and find out whats best for your app. Understand the storage engine-do some testing, then its easy to pick which is best.

Monday, July 14, 2008

ALTER TABLE, SELECT AND INNODB

Let's assume you have a 512MB table, and you decide to alter the table to add an index to make queries faster.

How long would you expect this alter to take? Hours? Days?

Even with 7200 RPM-slow disks the alter should of finished in less then 1/2 hour.

I ran across an alter that was running for 4 days-on 512MB datasize. The reason why it ran so long is because there was a SELECT that was running preventing mySQL from performing "rename table", the last leg of the ALTER TABLE process.

Killing that SELECT released the shared lock allowing the alter to finish.


Do not KILL THE ALTER when stuck in this SHARED LOCK STATE; Do NOT then remove the temporary tablespace file "#sql-320f_106f99a2.*".

What will happen if you do remove the #sql* file by hand?

Well for one INNODB will crash the mysql instance saying it could not find the temporary table space. It failed to open it. Then on recovery the original table gets unlinked from the filesystem and you just lost all data for that tablespace.

Why?

Here are roughly the order of events for an alter:

Lock all writes from said table
Make a temporary table #sql - file
Copy all data from the old file to the new file
Do a quick consistency check between the two files
unlink the old file
rename the temp file into the old file name

Each step operates on the data dictionary pointers for the two tables. Issuing a filesystem rm command for the step right before unlink, will cause INNODB to crash and on recovery unlink the old file and of course fail on the rename.

Monday, July 07, 2008

Group by ORDER by Optimization part II

In my previous blog post I talk about GROUP BY and ORDER BY optimizations. A member asked a great question that I'd like to share with everyone.


But what if the query was:

SELECT c1, c2, c3, SUM(c4) FROM T WHERE c1 = ? GROUP BY c2 ORDER BY c3 DESC LIMIT 10;




That query would produce a temp table and a filesort.
explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2 ORDER BY c3 DESC LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: column_test
type: ref
possible_keys: c1
key: c1
key_len: 5
ref: const
rows: 1
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)




The reason the index is c1,c2,c3,c4

So where c1=? and the group by of c2 would use that index, but to order the data properly you would need to do

WHERE c1 = ? GROUP BY c2, c3 ORDER BY c1 DESC, c2 DESC, c3 DESC

explain SELECT c1, c2, c3, SUM(c4) FROM column_test WHERE c1 = 1 GROUP BY c2,c3 ORDER BY c1, c2\G

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: column_test
type: ref
possible_keys: c1
key: c1
key_len: 5
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)

to get rid of the temp table or filesort. Filesorts and temp tables takes about 50% of the query time-so avoid these when the query is requested at a huge rate.

Also note that to get rid of the temporary table and filesort, the query changed and does not answer your question without post processing the data in PHP or some other layer.




The reason: the data is ordered by the index. For innodb the entire table is ordered by the PRIMARY key, each index has a reference to. The mysql implementation of group by does not know how to traverse and sort the data in the 1st pass of part of the key. The optimizer needs a lot of work. So, what is done from the mysql level is to automatically create a temp table and sort on that instead of using the index which it already traversed.

I believe in 5.1 that this case is being worked on in the optimizer level to get rid of this common slowdown.