Tuesday, January 17, 2012

mySQL Column Types and Why it Matters.

MySQL is awesome at converting strings to integers when comparing column lvalues with converted rvalues. So much so that many of us take this fact for granted. When does this assumption break down? When does passing in the wrong value cause problems in mySQL? Let's take a table EmailLookup for example.
CREATE TABLE `EmailLookup` (
  `userId` bigint(20) unsigned NOT NULL,
  `email` varchar(128) NOT NULL,
  `emailCrc32` int(11) unsigned NOT NULL,
  `createDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`emailCrc32`,`userId`),
  KEY `createDate` (`createDate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The primary key is emailCrc32, userId for a key size of 12 bytes (4 bytes for int, 8 bytes for bigint). Since this is a compound key (key with two columns), following the rules of Left Most Prefix I get two index lookup types for one. emailCrc32 and userId is a unique index lookup while emailCrc32 is also an index lookup. Thus I can do
SELECT email FROM EmailLookup WHERE emailCrc32 = ? and userId =?
OR
SELECT email FROM EmailLookup WHERE emailCrc32 = ?

What happens if I pass emailCrc32 a numeric string. i.e.
mysql> SELECT email FROM EmailLookup WHERE emailCrc32 = '1';

Empty set (0.00 sec)
So cool, works and comes back super quick. What happens if I pass emailCrc32 a real string. i.e.
mysql> SELECT email FROM EmailLookup WHERE emailCrc32 = 'a';
Empty set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+---------------------------------------------------------------+
| Level   | Code Message                                                         |
+---------+------+---------------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'a' for column 'emailCrc32' at row 1 |
+---------+------+---------------------------------------------------------------+
1 row in set (0.00 sec)
Comes back quick. What happens if I pass the column a big int
mysql> SELECT email FROM EmailLookup WHERE emailCrc32 = 100003256490710;
Empty set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------------------+
| Level   | Code | Message                                             |
+---------+------+-----------------------------------------------------+
| Warning | 1264 | Out of range value for column 'emailCrc32' at row 1 |
+---------+------+-----------------------------------------------------+
1 row in set (0.00 sec)
Comes back quick. But, what if I do a DELETE on a 64-bit server?
mysql> DELETE FROM EmailLookup WHERE emailCrc32 =  '100003256490710';

 Query OK, 0 rows affected (2 min 14.16 sec)
WHAT? A 2 min query for an impossible DELETE? Notice that '100003256490710' is a string. What is happening to INNODB
mySQL thread id 90620794, query id 3745441316 10.170.22.169 schoolfeed updating
DELETE FROM EmailLookup WHERE emailCrc32 = '100003256490710'
TABLE LOCK table `Shard1`.`EmailLookup` trx id 2AB524B6D lock mode IX
RECORD LOCKS space id 160 page no 5626 n bits 248 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 7580 n bits 240 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 6039 n bits 280 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 455 n bits 352 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 3174 n bits 288 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 5997 n bits 304 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 1486 n bits 296 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 5607 n bits 280 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
RECORD LOCKS space id 160 page no 2729 n bits 312 index `PRIMARY` of table `Shard1`.`EmailLookup` trx id 2AB524B6D lock_mode X locks rec but not gap
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
Yikes this is bad. Is this a bug? Maybe but is also a condition that should not happen if types are respected. The moral of the story is if your application respects column types mySQL will respect you :) This is from Server version: 5.1.57-rel12.8-log Percona Server (GPL), 12.8, Revision 233

2 comments:

shlominoach said...

Disturbing indeed.
My immediate guess is that the string is cast to FLOAT, due to its length. See also my post on such weird casting.

Can you attach the entry from your slow query log, so that we can see the number of rows scanned? (or also print out the difference in status variables)?

Rick James said...

Well, Duh! Garbage in, garbage out! Yeah, it happens a lot.


INT = 'a' — the 'a' converts to 0; there were no rows with 0.


INT = bigint-constant — ok it silently (until SHOW WARNINGS) downgraded the BIGINT and found nothing.


INT = 'long, numeric, string' — I guess it decided that VARCHAR was the dominant type and converted the INTs to strings, necessitating a table scan.