Andrew Walker crafted this handy little PHP function which can convert a UTF-16 encoded string into a more PHP-friendly UTF-8 encoded string.
The function first checks to see if the string passed to it is prefixed with a Byte Order Mark (BOM), and if the necessary BOM exists, the function continues to convert the rest of the string to its more compact UTF-8 format.
Obviously if no BOM is present, the function leaves the input string unchanged.
function utf16_to_utf8($str) { $c0 = ord($str[0]); $c1 = ord($str[1]); if ($c0 == 0xFE && $c1 == 0xFF) { $be = true; } else if ($c0 == 0xFF && $c1 == 0xFE) { $be = false; } else { return $str; } $str = substr($str, 2); $len = strlen($str); $dec = ''; for ($i = 0; $i < $len; $i += 2) { $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) : ord($str[$i + 1]) << 8 | ord($str[$i]); if ($c >= 0x0001 && $c <= 0x007F) { $dec .= chr($c); } else if ($c > 0x07FF) { $dec .= chr(0xE0 | (($c >> 12) & 0x0F)); $dec .= chr(0x80 | (($c >> 6) & 0x3F)); $dec .= chr(0x80 | (($c >> 0) & 0x3F)); } else { $dec .= chr(0xC0 | (($c >> 6) & 0x1F)); $dec .= chr(0x80 | (($c >> 0) & 0x3F)); } } return $dec; }
Thanks Andrew, this was exactly what I was looking for! :)
Related Link: http://www.moddular.org/log/utf16-to-utf8