So, what is a good way to convert to true utf8 character set with a unicode collation.
Rebuild the data! This is the way that I'm doing it to also shrink the innodb datafile. There is a lot of free space.
Now remember we have utf8 data in a latin1 container column, so we need to make sure that we get a dump of the raw data. MySQL will double pad the data with default options converting the latin1 charset to utf8, which is garbage.
mysqldump -hHOST -uroot -p DB TABLE --no-create-info --compatible=mysql323
This statement will dump the raw data, the option that tells mysqldump to do so is mysql323 compatible.
My personal preference is to
mysqldump -hHOST -uroot -p DB TABLE -T /tmp/
What this does is SELECT * INTO OUTFILE '/tmp/TABLE.txt'
This to will get raw data.
Since all of my tables are INNODB, dumping it out sorted by the PRIMARY index is good for re-import via LOAD DATA.
Next we dump the table structure and remove all occurances of latin1 on each of the column definitions.
Finally reimport the data
Start the slave.
So here is a good procedure.
- Stop the Slave
- Stop MySQL
- Put in your my.cnf skip-slave-start
- Start mySQL (we do the above steps to make sure everything is flushed to disk just to be sure)
- Next check that the dump point has enough space
- Dump the data via the method described above for each table
- Dump the table definition
- Stop mySQL
- Destroy the INNODB datafile and log files rm /var/lib/mysql/ib*
- Remove Database directory
- Edit my.cnf and define default startup for the server
[mysqld]
character-set-server=utf8
collation-server=utf8_unicode_ci
character-set-server=utf8
collation-server=utf8_unicode_ci
#
# on every connection we tell the server that transfer data as utf8 so it doesn't translate data comming in for us. SET NAMES is equivlent to
# SET character_set_client = utf8; SET character_set_connection = utf8;SET character_set_results = utf8
#
init-connect="SET NAMES = utf8" - Start mySQL
- CREATE DATABASE DATABASENAME DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
- Remove all references to latin1 from the column definition of each table
- At the end of each table definition add ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
- Add the tables to the database
- IMPORT THE DATA
Let me know if you have any questions.
5 comments:
only 1 question: what's "colliation"?
that's a bad spelling mistake. I didn't think anyone was reading this yet.
ah ha! thank you.
(i came here via cal's linklog- i assumed he was a mate of yours)
Hey so is Flickr currently migrating to 4.1? Just curious. Why not 5.x?
Nice blog BTW
Frank
We migrated to 4.1 a while ago but now I want to rebuild the data into pure utf8, since I have to reduce the size of the datafile.
I didn't want to use 5.0 because 4.1 is nice and stable and we don't need the features in 5.0.
Post a Comment