I wrote the script below in literally 2 mins. It's really basic and basically just CROSS JOINs two versions of the table one in utf8 the other in latin1 and reports back which string column is not correct. I haven't cleaned the script up, it's ugly and for my purposes will only exist for a short period of time.
test.$TABLE = is the original table latin1 for instance
- use mysqlimport this forces data to not get converted to latin1 from utf8 by setting the character set to binary (i.e. do no convert)
$DB.$TABLE = is the new utf8 table.
#!/usr/bin/php -q
$dbh = mysql_connect('localhost','root','',1);
$DB = 'your db';
$TABLE = 'your table';
if (!$dbh) {
die ("Failed to connect\n");
}
if (!mysql_select_db($DB,$dbh)) {
die ("Failed to changed dbs: " . mysql_error());
}
$query = "SELECT * FROM $TABLE LIMIT 1";
$result = mysql_query($query,$dbh);
$fields = mysql_num_fields($result);
print "Number of fields : $fields\n";
$names = array();
for($i = 0; $i < $fields; $i++) {
$type = mysql_field_type($result, $i);
$name = mysql_field_name($result, $i);
if ($type == 'string') {
$names[] = $name;
}
}
mysql_free_result($result);
foreach($names as $name) {
$query = "SELECT a1.id FROM $DB.$TABLE a1, test.$TABLE a2 WHERE a1.id=a2.id AND BINARY(a1.$name) != BINARY(a2.name)";
$result = mysql_query($query,$dbh);
while($row = mysql_fetch_assoc($result)) {
print "Invalid UTF8 Chars for column $name: ROWID: $row[id]\n";
}
mysql_free_result($result);
}
?>
1 comment:
Just a quick note you no longer need to pass the "-q" option. PHP CLI doesn't send HTTP headers...
Post a Comment