Unicode and Strings
Learn about the support for Unicode in Perl.
Unicode is a system used to represent the characters of the world’s written languages. Most English text uses a character set of only 127 characters (which requires 7 bits of storage and fits nicely into 8-bit bytes), but it’s naïve to believe that we won’t someday need an umlaut.
Perl strings
Perl strings can represent either of two separate but related data types:
Sequences of Unicode characters
Each character has a codepoint, a unique number that identifies it in the Unicode character set.
Sequences of octets
Binary data in a sequence of octets—8-bit numbers, each of which can represent a number between 0 and 255.
...