Contents

Literals and escape sequences

Introduction

The escape sequences found in strings, ucs and cset literals are shown below. Each of these types interpret the sequences slightly differently, as described in the following sections.

Escape sequences

Escape sequence Character ASCII
\b Backspace 8
\d Delete 127
\e Escape 27
\f Formfeed 12
\l Linefeed 10
\n Newline 10
\r Return 13
\t Horizontal tab 9
\v Vertical tab 11
\’ Single quote 39
\" Double quote 34
\\ Backslash 92
\ddd Octal code, 1-3 digits
\xdd Hexadecimal code, 1 or 2 digits
\^c Control code
\N Platform line ending (not csets) 10 or 13,10
\udddd Unicode 1-4 hexadecimal digits
\Udddddd Unicode 1-6 hexadecimal digits

Ucs literals

Each escape sequence (except perhaps \N) produces one character. In particular, the two Unicode escape sequences produce a single character corresponding to one Unicode code point. The other characters in the string must form valid UTF-8; so for example

  u"\xFF"

is not allowed. Instead, to get a string containing a single character 255 one should use :-

  u"\u00FF"

String literals

Plain string literals are just the same as ucs, with two exceptions. Firstly, there is of course no restriction regarding valid UTF-8. Secondly, the Unicode escapes expand to the UTF-8 sequences for the particular codepoint. Thus

  "\u00FF"

is identical to

  "\xc3\xbf"

Cset literals

Like ucs, each escape sequence represents a single character. The \N sequence is not recognised (and just represents “N”). Since a cset is really just a set of character numbers (and knows nothing of UTF-8),

  '\xFF'

is identical to

  '\u00FF'

Both contain a single character number 255.

The hyphen character has a special meaning in cset literals. It is used to specify ranges of characters. For example

  'a-z'

is the same as the predefined value &lcase. Therefore the hyphen must be escaped when used outside of this context :-

  punctuation := '.,;!\-:'

Without the backslash, the above cset would also contain all 24 ASCII characters between “!” and “:”.

Contents