PHP: Convert a UTF-16 String to a UTF-8 String

Andrew Walker crafted this handy little PHP function which can convert a UTF-16 encoded string into a more PHP-friendly UTF-8 encoded string.

The function first checks to see if the string passed to it is prefixed with a Byte Order Mark (BOM), and if the necessary BOM exists, the function continues to convert the rest of the string to its more compact UTF-8 format.

Obviously if no BOM is present, the function leaves the input string unchanged.

function utf16_to_utf8($str) {
    $c0 = ord($str[0]);
    $c1 = ord($str[1]);
 
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $be = false;
    } else {
        return $str;
    }
 
    $str = substr($str, 2);
    $len = strlen($str);
    $dec = '';
    for ($i = 0; $i < $len; $i += 2) {
        $c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) : 
                ord($str[$i + 1]) << 8 | ord($str[$i]);
        if ($c >= 0x0001 && $c <= 0x007F) {
            $dec .= chr($c);
        } else if ($c > 0x07FF) {
            $dec .= chr(0xE0 | (($c >> 12) & 0x0F));
            $dec .= chr(0x80 | (($c >>  6) & 0x3F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        } else {
            $dec .= chr(0xC0 | (($c >>  6) & 0x1F));
            $dec .= chr(0x80 | (($c >>  0) & 0x3F));
        }
    }
    return $dec;
}

Thanks Andrew, this was exactly what I was looking for! :)

Related Link: http://www.moddular.org/log/utf16-to-utf8

About Craig Lotter

Craig Lotter is a 29-ish year old software and web developer by trade (currently working for Touchwork), who also just happens to never have been able to shake off that pesky inner child within. Call him a fanboy, geek, nerd or whatever you want, just so long as you enjoy what he writes. His main personal site can be found at http://www.craiglotter.co.za and his webcomic, House of C can be found at http://www.houseofc.codeunit.co.za/
This entry was posted in Technology & Code and tagged , , , , , , , . Bookmark the permalink.