Monday, August 14, 2006

Quick php script to figure out bad utf8 characters.

When I wrote this post about invalid utf8 characters I needed a way to convert the mysql message into a real identifier for me to take a look at. Below is a quick and dirty script to figure out the bad rows.

I wrote the script below in literally 2 mins. It's really basic and basically just CROSS JOINs two versions of the table one in utf8 the other in latin1 and reports back which string column is not correct. I haven't cleaned the script up, it's ugly and for my purposes will only exist for a short period of time.

test.$TABLE = is the original table latin1 for instance
- use mysqlimport this forces data to not get converted to latin1 from utf8 by setting the character set to binary (i.e. do no convert)

$DB.$TABLE = is the new utf8 table.

#!/usr/bin/php -q

$dbh = mysql_connect('localhost','root','',1);
$DB = 'your db';
$TABLE = 'your table';

if (!$dbh) {
die ("Failed to connect\n");

if (!mysql_select_db($DB,$dbh)) {
die ("Failed to changed dbs: " . mysql_error());

$query = "SELECT * FROM $TABLE LIMIT 1";
$result = mysql_query($query,$dbh);

$fields = mysql_num_fields($result);

print "Number of fields : $fields\n";

$names = array();

for($i = 0; $i < $fields; $i++) {
$type = mysql_field_type($result, $i);
$name = mysql_field_name($result, $i);

if ($type == 'string') {
$names[] = $name;

foreach($names as $name) {

$query = "SELECT FROM $DB.$TABLE a1, test.$TABLE a2 WHERE AND BINARY(a1.$name) != BINARY(";

$result = mysql_query($query,$dbh);
while($row = mysql_fetch_assoc($result)) {
print "Invalid UTF8 Chars for column $name: ROWID: $row[id]\n";




Anonymous said...

Just a quick note you no longer need to pass the "-q" option. PHP CLI doesn't send HTTP headers...

Anonymous said...



A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,av dvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX