Multibyte strings

In php the string manipulation methods assume that one character is one byte, this works with ascii encoding but we should be careful not to get caught out if the encoding is set to a multibyte encoding.

We can get around this by using the mbstring methods.

Demonstration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<?php

$string = '$£%wow!😮';

echo 'No mb: ' . strlen($string) . PHP_EOL;
echo 'With mb: ' . mb_strlen($string) . PHP_EOL;

$chrArray = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
foreach ($chrArray as $chr) {
    echo $chr . ' ' . strlen($chr) . PHP_EOL;
}