Ambiguity in escape sequences

Christopher Bazley, March 2010

An escape character changes the interpretation of the character(s) directly following it. Typically this is used to represent control codes using a sequence of two or three printable characters (digraphs or trigraphs). However, it creates a new problem of how to represent the escape character without changing the interpretation of subsequent characters!

The escape character "\" (0x5C) can be represented by repetition in C and Java string literals: "\" would be encoded as the escape sequence "\\" (0x5C 0x5C), "\\" as "\\\\", "\\\" as "\\\\\\", etc. Because "\\" does not incorporate any other escape sequence, it unambiguously represents the escape character.

However, the escape character "^" (0x5E) cannot be represented in the 'caret notation' used for control key sequences because the digraph "^^" (0x5E 0x5E) is designated as representing a record separator (0x1E). It cannot be encoded as the trigraph "^^^" (0x5E 0x5E 0x5E) either because that leads to ambiguity: For example, should "^^^J" (0x5E 0x5E 0x5E 0x4A) be evaluated as "^^^" "J" (0x5E 0x4A) or "^^" "^J" (0x1E 0x0A)?

Looking at the ASCII table, it's obvious why "^^" (0x5E 0x5E) encodes character 0x1E rather than the escape character: Starting with "^@" (0x5E 0x40) for NUL (0x00), every digraph encodes the control character 0x40 earlier in the chart. But isn't "^" a stupid choice of escape character given that it lies within the range of digraphs used to represent control characters? There are 32 characters to be represented by digraphs, which leaves 27 - (32*2) = 64 other printable characters that they could have chosen.

D'OH!

Addendum: One could invent a new digraph to represent "^" but not without breaking the arithmetic relationship between the second character of the digraph and the character to be encoded. "^>"  (0x5E 0x3E) is probably the best candidate because it is contiguous with "^?" (0x5E 0x3F) for DEL (0x7F). There is no need to encode "~" (0x7E), so the potential clash shouldn't be problematic. However, it would have been more sensible to choose "~" as the escape character in the first place!