Tuesday, August 01, 2006

What does mySQL 4.1 do with utf8 and collation

I ran into a problem. I converted my latin1 table into utf8 with utf8_unicode_ci collation as described in my previous post. The table in question has a UNIQUE index on the utf8_unicode_ci collation column. When reimporting the data I get a duplicate entry on accent e with e itself. Why?

e == utf8 0x65
accented e == UTF-8 0xC3 0xA9, U+00E9

but the rules defined
http://www.unicode.org/reports/tr10/

say to ignore accents for unicode collations. To get around this I know to define my tables as

utf8 with collation of utf8_bin


This bug at mysql.com Here was the indicator that accent e is not the problem.

1 comment:

Anonymous said...

Hello Everyone

I have made a Web site about teacher expectations student achievement research.

I hope you check it out.

http://moti4u.com