RFC - 2130
The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996
| Original: | ftp://ftp.isi.edu/in-notes/rfc2130.txt |
|---|---|
| Authors: | C. Weider [Microsoft], C. Preston [Preston & Lynch], K. Simonsen [DKUUG], H. Alvestrand [UNINETT], R. Atkinson [Cisco Systems], M. Crispin [University of Washington], P. Svanberg [KTH] |
| Date: | April 1997 |
| Category: | Informational |
| Referred by: | 17 RFC |
| Refers to: | 31 RFC |
Status
This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
Abstract
Acknowledgments
The authors would like to sincerely thank Information Sciences Institute (ISI), and in particular Joyce K. Reynolds for graciously hosting this event; Joe Kemp and Jeanine Yamazaki of ISI made sure the facilities met our needs. We also wish to thank the Internet Society, which underwrote travel for participants who might not otherwise have been able to attend. Of course, we also wish to thank the many experts who participated in the workshop and on the mailing list; a complete list of these people can be found in Appendix D. Bunyip Information Systems was kind enough to provide mailing list facilities for this work.
Abstract
This report details the conclusions of an IAB-sponsored invitational workshop held 29 February - 1 March, 1996, to discuss the use of character sets on the Internet. It motivates the need to have character set handling in Internet protocols which transmit text, provides a conceptual framework for specifying character sets, recommends the use of MIME tagging for transmitted text, recommends a default character set *without* stating that there is no need for other character sets, and makes a series of recommendations to the IAB, IANA, and the IESG for furthering the integration of the character set framework into text transmission protocols.
0: Executive summary
The term 'Character Set' means many things to many people. Even the MIME registry of character sets registers items that have great differences in semantics and applicability. This workshop provides guidance to the IAB and IETF about the use of character sets on the Internet and provides a common framework for interoperability between the many characters in use there.
The framework consists of four components: an architecture model, which specifies components necessary for on-the-wire transmission of text; recommendations for tagging transmitted (and stored) text; recommended defaults for each level of the model; and a set of recommendations to the IAB, IANA, and the IESG for furthering the integration of this framework into text transmission protocols.
The architectural model specifies 7 layers, of which only three are required for on-the-wire transmission. The Coded Character Set is a mapping from a set of abstract characters to a set of integers. The Character Encoding Scheme is a mapping from a Coded Character Set (or several) to a set of octets. The Transfer Encoding Syntax is a transformation applied to data which has been encoded using a Character Encoding Scheme to allow it to be transmitted. These layers should be specified in a transmitted text stream by using the MIME encoding mechanisms.
This report recommends the use of ISO 10646 as the default Coded Character Set, and UTF-8 as the default Character Encoding Scheme in the creation of new protocols or new version of old protocols which transmit text. These defaults do not deprecate the use of other character sets when and where they are needed; they are simply intended to provide guidance and a specification for interoperability.
-
prepared by Miloslav Nic
- the founder of Zvon.org and Law-Ref.org
- the head of B.Sc. program Informatics and chemistry [in Czech]
- the founder of Lidem.org - Volby 2006 - parliamentary elections in the Czech Republic [in Czech]
- the chief consultant of the publishing house ICT Press
- and Pavel Srb, a student of B.Sc. program Informatics and chemistry
