2. Overview
UTF-9 encodes [UNICODE] codepoints in the low order 8 bits of a nonet, using the high order bit to indicate continuation. Surrogates are not used. [UNICODE] codepoints in the range U+0000 - U+00FF ([US-ASCII] and Latin 1) are represented by a single nonet; codepoints in the range U+0100 - U+FFFF (the remainder of the BMP) are represented by two nonets; and codepoints in the range U+1000 - U+10FFFF (remainder of [UNICODE]) are represented by three nonets. Non-[UNICODE] codepoints in [ISO-10646] (that is, codepoints in the range 0x110000 - 0x7fffffff) can also be represented in UTF-9 by obvious extension, but this is not discussed further as these codepoints have been removed from [ISO-10646] by ISO. UTF-18 encodes [UNICODE] codepoints in the Basic Multilingual Plane (BMP, plane 0), Supplementary Multilingual Plane (SMP, plane 1), Supplementary Ideographic Plane (SIP, plane 2), and Supplementary Special-purpose Plane (SSP, plane 14) in a single 18-bit value. It does not encode planes 3 though 13, which are currently unused; nor planes 15 or 16, which are private spaces. Normally, UTF-9 and UTF-18 should only be used in the context of 9 bit storage and transport. Although some protocols, e.g., [FTP], support transport of nonets, the current IETF protocol suite is quite deficient in this area. The IETF is urged to take action to improve IETF protocol support for nonets.
