AI Features

Unicode and Strings

Learn about the support for Unicode in Perl.

Unicode is a system used to represent the characters of the world’s written languages. Most English text uses a character set of only 127 characters (which requires 7 bits of storage and fits nicely into 8-bit bytes), but it’s naïve to believe that we won’t someday need an umlaut.

Perl strings

Perl strings can represent either of two separate but related data types:

Sequences of Unicode characters

Each character has a codepoint, a unique number that identifies it in the Unicode character set.

Sequences of octets

Binary data in a sequence of octets—8-bit numbers, each of which can represent a number between 0 and 255.

...