RFC 4646:Tags for Identifying Languages
RFC-Ref

language tag


Click on the red underlined text to get to the source

... This document specifies a particular identifier mechanism (the language tag) and a registration function for values to be used to form tags ...


... The Language Tag ...
... Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of ...
... The language tag is composed of one or more parts, known as "subtags". Each subtag consists of a sequence of alphanumeric characters. Subtags are distinguished and separated from one another ...
... by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a "primary language" subtag and a (possibly empty) series of subsequent ...
... searching and matching operations. The syntax of the language tag in ABNF [RFC4234] is: ...
... DIGIT) ; letters and numbers Figure 1: Language Tag ABNF ...
... All subtags have a maximum length of eight characters and whitespace is not permitted in a language tag. For examples of language tags, see Appendix B. ...
... All subtags have a maximum length of eight characters and whitespace is not permitted in a language tag. For examples of language tags, see Appendix B. ...
... Note that although [RFC4234] refers to octets, the language tags described in this document are sequences of characters from the US-ASCII ...
... US-ASCII [ISO646] repertoire. Language tags MAY be used in documents and applications that use other encodings, so long as these encompass ...
... Cyrillic script as used in Mongolia. Although case distinctions do not carry meaning in language tags, consistent formatting and presentation of the tags will aid users. ...
... The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA ...
... o Tag or tags refers to a complete language tag, such as "fr-Latn-CA". Examples of tags ...
... is an [ISO15924] script code that was used to define the 'Latn' script subtag for use in a language tag. Examples of codes in this document are enclosed in single quotes ('en', 'Latn'). ...
... The definitions in this section apply to the various subtags within the language tags defined by this document, excepting those "grandfathered" tags defined in Section 2.2.8. ...
... tags defined in Section 2.2.8. Language tags are designed so that each subtag type has unique length and content restrictions. These make identification of the subtag's type possible, even if the content of the subtag itself is ...
... The primary language subtag is the first subtag in a language tag (with the exception of private use and certain grandfathered tags ...
... range 'qaa' through 'qtz' are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 639-2 for private use ...
... 6. The single-character subtag 'x' as the primary subtag indicates that the language tag consists solely of subtags whose meaning is defined by private agreement. For example, in the tag ...
... 4. Extended language subtags MUST NOT be registered or used to form language tags. Their syntax is described here so that implementations can be compatible with any future revision of this document that does provide for their registration ...
... 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 15924 for private use ...
... registration for that purpose. 5. There MUST be at most one script subtag in a language tag, and the script subtag SHOULD be omitted when it adds no distinguishing value to the tag ...
... groupings' MUST NOT be registered in the IANA registry and MUST NOT be used to form language tags. C. UN numeric codes for countries or areas with ambiguous ISO ...
... registry, MUST be defined according to the rules in Section 3.4 and MUST be used to form language tags that represent the country or region for which they are defined. ...
... entered into the registry and MUST NOT be used to form language tags. Note that the ISO 3166-based subtag in the registry ...
... not presently registered MAY be entered into the IANA registry via the process described in Section 3.5. Once registered, these codes MAY be used to form language tags. F. All other UN numeric codes for countries or areas that do not ...
... ISO 3166 alpha-2 code MUST NOT be entered into the registry and MUST NOT be used to form language tags. For more information about these codes, see Section 3.4. ...
... MUST NOT be entered into the registry and MUST NOT be used to form language tags. (At the time this document was created, these values matched the ISO ...
... ISO 3166 alpha-2 codes.) 5. There MUST be at most one region subtag in a language tag and the region subtag MAY be omitted, as when it adds no distinguishing value to the tag ...
... AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 3166 for private use ...
... private use subtag sequences. 3. More than one variant MAY be used to form the language tag. 4. Variant subtags MUST be registered with IANA ...
... IANA according to the rules in Section 3.5 of this document before being used to form language tags. In order to distinguish variants from other types of subtags, registrations MUST meet the following length and ...
... registry MAY include one or more 'Prefix' fields, which indicate the language tag or tags that would make a suitable prefix ...
... that would make a suitable prefix (with other subtags, as appropriate) in forming a language tag with the variant. For example, the subtag 'nedis' has a Prefix of "sl", making it suitable ...
... example, the subtag 'nedis' has a Prefix of "sl", making it suitable to form language tags such as "sl-nedis" and "sl-IT-nedis", but not suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". ...
... Extensions provide a mechanism for extending language tags for use in various applications. See Section 3.7. The following rules apply to extensions: ...
... 3. An extension MUST follow at least a primary language subtag. That is, a language tag cannot begin with an extension. Extensions extend language tags, they do not override or replace ...
... That is, a language tag cannot begin with an extension. Extensions extend language tags, they do not override or replace them. For example, "a-value" is not a well-formed language tag ...
... language tags, they do not override or replace them. For example, "a-value" is not a well-formed language tag, while "de-a-value" is. ...
... Existing IANA-registered language tags from RFC 1766(-> 3282draft | 3066(-> 4647 | 4646)) and/or RFC 3066(-> 4647 | 4646) ...
... see Section 3.8. It is important to note that all language tags formed under the guidelines in this document were either legal, well-formed tags ...
... An implementation that claims to check for well-formed language tags MUST: ...
... language, script, region, and variant subtags consist of valid codes for use in language tags according to the IANA registry as of the particular date specified by the implementation. ...


... update procedures associated with it, as well as a registry for extensions to language tags (Section 3.7). The Language ...
... Registry contains a comprehensive list of all of the subtags valid in language tags. This allows implementers a straightforward and reliable way to validate ...
... implementers a straightforward and reliable way to validate language tags. The Language Subtag Registry ...
... extension subtags, it is possible to validate all of the subtags that appear in a language tag under the provisions of this document or its revisions or successors. In addition, the meaning of the various subtags will be unambiguous and stable over time. (The meaning of ...
... * Tag's field-value contains a complete language tag. This field MUST only appear in records whose 'Type' has one of these values: "grandfathered" or "redundant". Note that the field- ...
... replace the content of the source standard itself. The descriptions are not intended to be the English localized names for the subtags. Localization or translation of language tag and subtag descriptions is out of scope of this document. ...
... language', 'extlang', 'script', 'region', and 'variant', 'Preferred-Value' contains the subtag of the same 'Type' that is preferred for forming the language tag. * For fields of type 'grandfathered' and 'redundant', a canonical ...
... * For fields of type 'grandfathered' and 'redundant', a canonical mapping to a complete language tag. o Deprecated ...
... * Prefix's field-value contains a language tag with which this subtag MAY be used to form a new language tag, perhaps with ...
... Prefix's field-value contains a language tag with which this subtag MAY be used to form a new language tag, perhaps with other subtags as well. This field MUST only appear in records whose 'Type' field-value is 'variant' or 'extlang'. For ...
... deemed appropriate for understanding the registry and implementing language tags using the subtag or tag. ...
... * Suppress-Script contains a script subtag that SHOULD NOT be used to form language tags with the associated primary language subtag. This field MUST only appear in records whose 'Type' ...
... registry. Although valid in language tags, subtags and tags with a 'Deprecated' field are deprecated and validating processors ...
... tag or subtag. The value in this field is STRONGLY RECOMMENDED as the best choice to represent the value of this record when selecting a language tag. These values form three groups: ...
... Records that contain a 'Preferred-Value' field MUST also have a 'Deprecated' field. This field contains a date of deprecation. Thus, a language tag processor can use the registry to construct the ...
... processor can construct the set of valid language tags that correspond to that tag for all dates up to the date of the registry ...
... beneficial to applications that are matching, selecting, for filtering content based on its language tags. Note that 'Preferred-Value' mappings in records of type 'region' ...
... value. There are many reasons for a country code to be changed, and the effect this has on the formation of language tags will depend on the nature of the change in question. ...
... The 'Preferred-Value' field in records of type "grandfathered" and "redundant" contains whole language tags that are strongly RECOMMENDED for use in place of the record's value. In many cases, the mappings were created ...
... The field-value of the 'Prefix' field consists of a language tag whose subtags are appropriate to use with this subtag. For example, the variant subtag '1996' has a 'Prefix ...
... overwhelming majority of documents for the given language and that therefore adds no distinguishing information to a language tag. It helps ensure greater compatibility between the language tags ...
... language tag. It helps ensure greater compatibility between the language tags generated according to the rules in this document and language tags ...
... compatibility between the language tags generated according to the rules in this document and language tags and tag processors ...
... registry is critical to the long-term stability of language tags. The rules in this section guarantee that a specific language tag's meaning is ...
... critical to the long-term stability of language tags. The rules in this section guarantee that a specific language tag's meaning is stable over time and will not change. ...
... withdrawn by their respective maintenance or registration authority remain valid in language tags. A 'Deprecated' field containing the date of withdrawal is added to the record. If a ...
... Timor' when it was under administration by Portugal). The subtag 'TP' remains valid in language tags, but its record contains the a 'Preferred-Value' of 'TL' and its field 'Deprecated' contains the date the new code was assigned ...
... IANA registry, then the field 'Type' in that record is changed from 'grandfathered' to 'redundant'. Note that this will not affect language tags that match the grandfathered tag, since these tags ...
... valid generative subtag sequences. For example, if the subtag 'gan' in the language tag "zh-gan" were to be registered as an extended language subtag, then the grandfathered tag ...
... Variant subtags are usually registered for use with a particular range of language tags. For example, the subtag 'rozaj' is intended for use with language tags that start ...
... range of language tags. For example, the subtag 'rozaj' is intended for use with language tags that start with the primary language ...
... identifiers that contain a language component and are compatible with applications that understand language tags. The structure and form of extensions are defined by this document so ...
... the extension MUST maintain the accuracy of the record by sending an updated full copy of the record to iana@iana.org with the subject line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only the 'Comments', 'Contact_Email ...
... %% Figure 6: Format of Records in the Language Tag Extensions Registry ...
... truncated subtags. When a language tag is to be used in a specific, known, protocol, it is RECOMMENDED that the language tag not contain extensions not ...
... When a language tag is to be used in a specific, known, protocol, it is RECOMMENDED that the language tag not contain extensions not supported by that protocol. In addition, note that some protocols MAY impose upper limits on the length of the strings used to store or ...
... MAY impose upper limits on the length of the strings used to store or transport the language tag. ...
... Registry containing the various subtags initially valid in a language tag is necessary. This collection of subtags, along with a description of the process used to create it, is described by ...
... [RFC3066] when this document is adopted MAY be completed under the former rules, at the discretion of the Language Tag Reviewer (as described in [RFC3066]). Until the IESG ...
... IESG officially appoints a Language Subtag Reviewer, the existing Language Tag Reviewer SHALL serve as the Language Subtag Reviewer. ...
... An initial version of the Language Tag Extensions Registry described in Section 3.7 is also needed. The Language Tag ...
... Language Tag Extensions Registry described in Section 3.7 is also needed. The Language Tag Extensions Registry SHALL be initialized with a single record containing a single field ...


... Formation and Processing of Language Tags ...
... registry with the tag syntax to choose, form, and process language tags. ...
... Choice of Language Tag ...
... Interoperability is best served when all users use the same language tag in order to represent the same language. If an application has requirements ...
... application risks damaging interoperability. It is strongly RECOMMENDED that users not define their own rules for language tag choice. ...
... Subtags SHOULD only be used where they add useful distinguishing information; extraneous subtags interfere with the meaning, understanding, and processing of language tags. In particular, users and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' ...
... registry (defined in Section 3.1): these fields provide guidance on when specific additional subtags SHOULD (and SHOULD NOT) be used in a language tag. Of particular note, many applications can benefit from the use of ...
... Of particular note, many applications can benefit from the use of script subtags in language tags, as long as the use is consistent for a given context. Script subtags were not formally defined in RFC ...
... registry, the Suppress-Script field helps ensure greater compatibility between the language tags generated according to the rules in this document and language tags and tag ...
... compatibility between the language tags generated according to the rules in this document and language tags and tag processors or ...
... language and region subtags and are reserved for future standardization. Applications might benefit from their judicious use in forming language tags in the future. Similar recommendations are expected to apply to their use as apply to script subtags. ...
... here. The choice of subtags used to form a language tag SHOULD be guided by the following rules: ...
... precise for such a task. 2. The script subtag SHOULD NOT be used to form language tags unless the script adds some distinguishing information to the tag. The ...
... registry entry, then the value of that field SHOULD be used to form the language tag in preference to the tag or subtag in which the preferred value appears. ...
... used to label content, even if the language is unknown. Omitting the language tag altogether is preferred to using a tag with a primary language ...
... primary language subtag of 'und'. The 'und' subtag MAY be useful for protocols that require a language tag to be provided. The 'und' subtag MAY also be useful when matching language tags in ...
... for protocols that require a language tag to be provided. The 'und' subtag MAY also be useful when matching language tags in certain situations. ...
... 6. The same variant subtag SHOULD NOT be used more than once within a language tag. * For example, do not use "de-DE-1901-1901". ...
... backward compatibility, this document contains several provisions to account for potential instability in the standards used to define the subtags that make up language tags. These provisions mean that no language tag created ...
... standards used to define the subtags that make up language tags. These provisions mean that no language tag created under the rules in this document will become obsolete. ...
... Meaning of the Language Tag ...
... section gives only possible examples of its usage. o For a single information object, the associated language tags might be interpreted as the set of languages that is necessary for ...
... o For an aggregation of information objects, the associated language tags could be taken as the set of languages used inside components of that aggregation ...
... o For information objects whose purpose is to provide alternatives, the associated language tags could be regarded as a hint that the content is provided in several languages ...
... inappropriate Norwegian rules. Language tags are related when they contain a similar sequence of subtags. For example, if a language tag B contains language tag ...
... Language tags are related when they contain a similar sequence of subtags. For example, if a language tag B contains language tag A as a prefix ...
... Language tags are related when they contain a similar sequence of subtags. For example, if a language tag B contains language tag A as a prefix, then B is typically "narrower" or "more specific" than A. ...
... [RFC3066] did not provide an upper limit on the size of language tags. While RFC 3066(-> 4647 | 4646) did define the semantics of particular subtags ...
... 3066(-> 4647 | 4646) did define the semantics of particular subtags in such a way that most language tags consisted of language and region subtags with a combined total length of up to six characters, ...
... registered. Neither the language tag syntax nor other requirements in this document impose a fixed upper limit on the number of subtags in a ...
... requirements in this document impose a fixed upper limit on the number of subtags in a language tag (and thus an upper bound on the size of a tag). The language tag ...
... language tag (and thus an upper bound on the size of a tag). The language tag syntax suggests that, depending on the specific language, more subtags (and thus a longer tag ...
... Some applications and protocols are forced to allocate fixed buffer sizes or otherwise limit the length of a language tag. A conformant implementation or specification MAY refuse to support the storage of language tags ...
... language tag. A conformant implementation or specification MAY refuse to support the storage of language tags that exceed a specified length. Any such limitation SHOULD be clearly documented, and such documentation SHOULD include what happens to longer tags ...
... tags (for example, whether an error value is generated or the language tag is truncated). A protocol that allows tags to be truncated at an arbitrary limit, without giving any ...
... tags in substantial ways. In practice, most language tags do not require more than a few subtags and will not approach reasonably sized buffer limitations; ...
... have a fixed length limitation. For example, [RFC2231] has no explicit length limitation: the length available for the language tag is constrained by the length of other header components (such as the ...
... buffer limit are: Implementations SHOULD NOT truncate language tags unless the meaning of the tag is purposefully being changed, or unless the ...
... Protocols or specifications that specify limited buffer sizes for language tags MUST allow for language tags of up to 33 characters. ...
... buffer sizes for language tags MUST allow for language tags of up to 33 characters. Protocols or specifications that specify limited buffer sizes ...
... Protocols or specifications that specify limited buffer sizes for language tags SHOULD allow for language tags of at least 42 characters. ...
... buffer sizes for language tags SHOULD allow for language tags of at least 42 characters. ...
... Truncation of Language Tags ...
... Truncation of a language tag alters the meaning of the tag, and thus SHOULD be avoided. However, truncation of language tags ...
... language tag alters the meaning of the tag, and thus SHOULD be avoided. However, truncation of language tags is sometimes necessary due to limited buffer sizes. Such truncation MUST NOT ...
... so by progressively removing subtags along with their preceding "-" from the right side of the language tag until the tag is short enough for the given buffer ...
... Canonicalization of Language Tags ...
... Since a particular language tag is sometimes used by many processes, language tags SHOULD always be created ...
... Since a particular language tag is sometimes used by many processes, language tags SHOULD always be created or generated in a canonical form. ...
... canonical form. A language tag is in canonical form when: ...
... singleton subtag. Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in ...
... canonical form. Example: The language tag "en-BU" (English as used in Burma) is not canonical ...
... Canonicalization of language tags does not imply anything about the use of upper or lowercase letters when processing or comparing subtags (and as described in Section 2.1). All comparisons MUST be ...
... When performing canonicalization of language tags, processors MAY regularize the case of the subtags (that is, this process is ...
... Implementers SHOULD specify a locale-neutral casing operation to ensure that case folding of subtags does not produce this value, which is illegal in language tags. For example, if one were to uppercase the region subtag 'in' using Turkish locale rules, the sequence U+0130 U+004E would result instead of the expected 'IN ...
... tags that include these values, although the values are canonical when they appear in a language tag. An extension MUST define any relationships that exist between the ...
... no meaning outside the private agreement between the parties that intend to use or exchange language tags that employ them. The same subtags MAY be used with a different meaning under a separate private agreement ...
... semantic meaning of private use tags and of the subtags used within such a language tag are not defined by this document. ...
... ISO 3166 private use codes) MAY be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a great deal of public, interchangeable information about the language ...


... The new registry MUST be listed under "Language Tags" at <http://www.iana.org/numbers.html>, replacing the existing registrations ...
... forms and RFC 3066(-> 4647 | 4646) registrations MUST be relabeled as "Language Tags (Obsolete)" and maintained (but not added to or modified). ...
... The Language Tag Extensions Registry will also be generated and sent to IANA ...
... Future work by IANA on the Language Tag Extensions Registry is limited to two cases. First, the IESG ...


... Language tags used in content negotiation, like any other information exchanged on the Internet ...
... defenses). The language tag associated with a particular information item is of no consequence whatsoever in determining whether that content might contain possible homographs. The fact that a text is tagged as being ...
... language or using a particular script subtag provides no assurance whatsoever that it does not contain characters from scripts other than the one(s) associated with or specified by that language tag. Since there is no limit to the number of variant, private use ...
... buffer overflow attacks. See Section 4.3 for details on language tag truncation, which can occur as a consequence of defenses against buffer overflow. ...


... The syntax in this document requires that language tags use only the characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most character sets ...
... characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most character sets, so the composition of language tags should not have any character set issues. ...
... character set issues. Rendering of characters based on the content of a language tag is not addressed in this memo. Historically, some languages have relied on ...
... applies to language- and culture-specific variations of Han ideographs as used in Japanese, Chinese, and Korean). When language tags are applied to spans of text, rendering engines sometimes use that information in deciding which font to use in the absence of other information, particularly where languages ...


... The main goals for this revision of language tags were the following: *Compatibility ...
... *Compatibility.* All RFC 3066(-> 4647 | 4646) language tags (including those in the IANA registry) remain valid ...
... valid in this specification. The changes in this document represent additional constraints on language tags. That is, in no case is the syntax more permissive and processors ...
... XMLSchema]) will be able to process the tags described by this document. In addition, this document defines language tags in such as way as to ensure future compatibility. ...
... standards, a valid RFC 3066(-> 4647 | 4646) language tag could become invalid or have its meaning change. This has the potential of invalidating content that may have an extensive shelf-life. In this specification, once a ...
... its meaning change. This has the potential of invalidating content that may have an extensive shelf-life. In this specification, once a language tag is valid, it remains valid forever. ...
... *Validity.* The structure of language tags defined by this document makes it possible to determine if a particular tag is well-formed ...
... registry with specific versioning information, the validity of language tags at any point in time can be precisely determined (instead of interpolating values from many separate sources). ...
... without resorting to the registration process. The addition of UN M.49 codes provides for the generation of language tags with regional scope, which is also required by some applications. ...
... The recast of the registry from containing whole language tags to subtags is a key part of this. An important feature of RFC 3066(-> 4647 | 4646) was ...
... subtags. *Extensibility.* Because of the widespread use of language tags, it is disruptive to have periodic revisions of the core specification, even in the face of demonstrated need. The extension mechanism ...
... even in the face of demonstrated need. The extension mechanism provides for a way for independent RFCs to define extensions to language tags. These extensions have a very constrained, well- defined structure that prevents extensions from interfering with implementations of language tags ...
... language tags. These extensions have a very constrained, well- defined structure that prevents extensions from interfering with implementations of language tags defined in this document. The document also anticipates features of ISO ...
... language subtags, as well as the possibility of other ISO 639 parts becoming useful for the formation of language tags in the future. ...
... o Replaces the IANA language tag registry with a language subtag ...
... registry becomes the canonical source for forming language tags. o Provides a process that guarantees stability of language tags ...
... language tags. o Provides a process that guarantees stability of language tags, by handling reuse of values by ISO 639, ISO ...
... method for indicating in the registry when script subtags are necessary for a given language tag. o Adds the concept of a variant subtag and allows variants to be ...


... Phillips, A., Ed. and M. Davis, Ed., "Matching of Language Tags", BCP 47, RFC 4647, September 2006. ...


... 1766(-> 3282draft | 3066(-> 4647 | 4646)), the precursors of this document, made enormous contributions directly or indirectly to this document and are generally responsible for the success of language tags. The following people (in alphabetical order) contributed to this ...
... originated RFCs 1766 and 3066, and without whom this document would not have been possible. Special thanks must go to Michael Everson, who has served as Language Tag Reviewer for almost the complete period since the publication of RFC 1766(-> 3282draft | 3066(-> 4647 | 4646)). Special thanks to Doug ...
... Ewell, for his production of the first complete subtag registry, and his work in producing a test parser for verifying language tags. ...


... Appendix B. Examples of Language Tags (Informative) ...



Google
Web
RFC-Ref