language tag
Click on the red underlined text to get to the source
... This document specifies a particular identifier mechanism (the
language tag) and a registration function for values to be used to
form tags ...
... The Language Tag ...
...
Language tags are used to help identify languages, whether spoken,
written, signed, or otherwise signaled, for the purpose of
...
...
The language tag is composed of one or more parts, known as
"subtags". Each subtag consists of a sequence of alphanumeric
characters. Subtags are distinguished and separated from one another
...
... by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a
"primary language" subtag and a (possibly empty) series of subsequent
...
...
All subtags have a maximum length of eight characters and whitespace
is not permitted in a language tag. For examples of language tags,
see Appendix B.
...
... All subtags have a maximum length of eight characters and whitespace
is not permitted in a language tag. For examples of language tags,
see Appendix B.
...
...
Note that although [RFC4234] refers to octets, the language tags
described in this document are sequences of characters from the
US-ASCII ...
... US-ASCII [ISO646] repertoire. Language tags MAY be used in documents
and applications that use other encodings, so long as these encompass
...
... Cyrillic script as used in Mongolia.
Although case distinctions do not carry meaning in language tags,
consistent formatting and presentation of the tags will aid users.
...
...
The namespace of language tags and their subtags is administered by
the Internet Assigned Numbers Authority (IANA ...
... is an [ISO15924] script code that was used to define the 'Latn'
script subtag for use in a language tag. Examples of codes in
this document are enclosed in single quotes ('en', 'Latn').
...
...
The definitions in this section apply to the various subtags within
the language tags defined by this document, excepting those
"grandfathered" tags defined in Section 2.2.8.
...
... tags defined in Section 2.2.8.
Language tags are designed so that each subtag type has unique length
and content restrictions. These make identification of the subtag's
type possible, even if the content of the subtag itself is
...
...
The primary language subtag is the first subtag in a language tag
(with the exception of private use and certain grandfathered tags ...
... range 'qaa' through 'qtz' are reserved for
private use in language tags. These subtags correspond to codes
reserved by ISO 639-2 for private use ...
...
6. The single-character subtag 'x' as the primary subtag indicates
that the language tag consists solely of subtags whose meaning is
defined by private agreement. For example, in the tag ...
... 4. Extended language subtags MUST NOT be registered or used to form
language tags. Their syntax is described here so that
implementations can be compatible with any future revision of
this document that does provide for their registration ...
...
3. The script subtags 'Qaaa' through 'Qabx' are reserved for private
use in language tags. These subtags correspond to codes reserved
by ISO 15924 for private use ...
... registration for that purpose.
5. There MUST be at most one script subtag in a language tag, and
the script subtag SHOULD be omitted when it adds no
distinguishing value to the tag ...
... groupings' MUST NOT be registered in the IANA registry and
MUST NOT be used to form language tags.
C. UN numeric codes for countries or areas with ambiguous ISO ...
... registry, MUST be
defined according to the rules in Section 3.4 and MUST be
used to form language tags that represent the country or
region for which they are defined.
...
... entered into the registry and MUST NOT be used to form
language tags. Note that the ISO 3166-based subtag in the
registry ...
... not presently registered MAY be entered into the IANA
registry via the process described in Section 3.5. Once
registered, these codes MAY be used to form language tags.
F. All other UN numeric codes for countries or areas that do not
...
... ISO 3166 alpha-2 code MUST NOT be entered
into the registry and MUST NOT be used to form language tags.
For more information about these codes, see Section 3.4.
...
... MUST NOT be entered into the registry and MUST NOT be used to
form language tags. (At the time this document was created,
these values matched the ISO ...
... ISO 3166 alpha-2 codes.)
5. There MUST be at most one region subtag in a language tag and the
region subtag MAY be omitted, as when it adds no distinguishing
value to the tag ...
... AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
reserved for private use in language tags. These subtags
correspond to codes reserved by ISO 3166 for private use ...
... private use subtag sequences.
3. More than one variant MAY be used to form the language tag.
4. Variant subtags MUST be registered with IANA ...
... IANA according to the
rules in Section 3.5 of this document before being used to form
language tags. In order to distinguish variants from other types
of subtags, registrations MUST meet the following length and
...
... registry MAY include
one or more 'Prefix' fields, which indicate the language tag or tags
that would make a suitable prefix ...
... that would make a suitable prefix (with other subtags, as
appropriate) in forming a language tag with the variant. For
example, the subtag 'nedis' has a Prefix of "sl", making it suitable
...
... example, the subtag 'nedis' has a Prefix of "sl", making it suitable
to form language tags such as "sl-nedis" and "sl-IT-nedis", but not
suitable for use in a tag such as "zh-nedis" or "it-IT-nedis".
...
...
Extensions provide a mechanism for extending language tags for use in
various applications. See Section 3.7. The following rules apply to
extensions:
...
... 3. An extension MUST follow at least a primary language subtag.
That is, a language tag cannot begin with an extension.
Extensions extend language tags, they do not override or replace
...
... That is, a language tag cannot begin with an extension.
Extensions extend language tags, they do not override or replace
them. For example, "a-value" is not a well-formed language tag ...
... language tags, they do not override or replace
them. For example, "a-value" is not a well-formed language tag,
while "de-a-value" is.
...
...
Existing IANA-registered language tags from RFC 1766(-> 3282draft | 3066(-> 4647 | 4646)) and/or RFC 3066(-> 4647 | 4646)
...
... see Section 3.8.
It is important to note that all language tags formed under the
guidelines in this document were either legal, well-formed tags ...
...
An implementation that claims to check for well-formed language tags
MUST:
...
... language, script, region, and variant subtags consist of valid
codes for use in language tags according to the IANA registry as
of the particular date specified by the implementation.
...
... update procedures associated with it, as well as a registry for
extensions to language tags (Section 3.7).
The Language ...
... Registry contains a comprehensive list of all of
the subtags valid in language tags. This allows implementers a
straightforward and reliable way to validate ...
... implementers a
straightforward and reliable way to validate language tags. The
Language Subtag Registry ...
... extension subtags, it is possible to validate all of the subtags that
appear in a language tag under the provisions of this document or its
revisions or successors. In addition, the meaning of the various
subtags will be unambiguous and stable over time. (The meaning of
...
...
* Tag's field-value contains a complete language tag. This field
MUST only appear in records whose 'Type' has one of these
values: "grandfathered" or "redundant". Note that the field-
...
... replace the content of the source standard itself. The descriptions
are not intended to be the English localized names for the subtags.
Localization or translation of language tag and subtag descriptions
is out of scope of this document.
...
... language', 'extlang', 'script', 'region',
and 'variant', 'Preferred-Value' contains the subtag of the
same 'Type' that is preferred for forming the language tag.
* For fields of type 'grandfathered' and 'redundant', a canonical ...
... * For fields of type 'grandfathered' and 'redundant', a canonical
mapping to a complete language tag.
o Deprecated
...
...
* Prefix's field-value contains a language tag with which this
subtag MAY be used to form a new language tag, perhaps with
...
... Prefix's field-value contains a language tag with which this
subtag MAY be used to form a new language tag, perhaps with
other subtags as well. This field MUST only appear in records
whose 'Type' field-value is 'variant' or 'extlang'. For
...
... deemed appropriate for understanding the registry and
implementing language tags using the subtag or tag.
...
...
* Suppress-Script contains a script subtag that SHOULD NOT be
used to form language tags with the associated primary language
subtag. This field MUST only appear in records whose 'Type'
...
... registry. Although
valid in language tags, subtags and tags with a 'Deprecated' field
are deprecated and validating processors ...
... tag or subtag. The value in this field
is STRONGLY RECOMMENDED as the best choice to represent the value of
this record when selecting a language tag. These values form three
groups:
...
... Records that contain a 'Preferred-Value' field MUST also have a
'Deprecated' field. This field contains a date of deprecation.
Thus, a language tag processor can use the registry to construct the
...
... processor can construct the set of valid
language tags that correspond to that tag for all dates up to the
date of the registry ...
... beneficial to applications that are matching, selecting, for
filtering content based on its language tags.
Note that 'Preferred-Value' mappings in records of type 'region'
...
... value. There are many reasons for a country code to be changed, and
the effect this has on the formation of language tags will depend on
the nature of the change in question.
...
...
The 'Preferred-Value' field in records of type "grandfathered" and
"redundant" contains whole language tags that are strongly
RECOMMENDED for use in place of the record's value. In many cases,
the mappings were created ...
...
The field-value of the 'Prefix' field consists of a language tag
whose subtags are appropriate to use with this subtag. For example,
the variant subtag '1996' has a 'Prefix ...
... overwhelming majority of documents for the given language and that
therefore adds no distinguishing information to a language tag. It
helps ensure greater compatibility between the language tags ...
... language tag. It
helps ensure greater compatibility between the language tags
generated according to the rules in this document and language tags
...
... compatibility between the language tags
generated according to the rules in this document and language tags
and tag processors ...
... registry is
critical to the long-term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is
...
... critical to the long-term stability of language tags. The rules in
this section guarantee that a specific language tag's meaning is
stable over time and will not change.
...
... withdrawn by their respective maintenance or registration
authority remain valid in language tags. A 'Deprecated' field
containing the date of withdrawal is added to the record. If a
...
... Timor' when it was under administration by Portugal). The
subtag 'TP' remains valid in language tags, but its record
contains the a 'Preferred-Value' of 'TL' and its field
'Deprecated' contains the date the new code was assigned
...
... IANA registry, then the field 'Type'
in that record is changed from 'grandfathered' to 'redundant'.
Note that this will not affect language tags that match the
grandfathered tag, since these tags ...
... valid
generative subtag sequences. For example, if the subtag 'gan'
in the language tag "zh-gan" were to be registered as an
extended language subtag, then the grandfathered tag ...
... Variant subtags are usually registered for use with a particular
range of language tags. For example, the subtag 'rozaj' is intended
for use with language tags that start ...
... range of language tags. For example, the subtag 'rozaj' is intended
for use with language tags that start with the primary language
...
... identifiers that contain a language component and are compatible
with applications that understand language tags.
The structure and form of extensions are defined by this document so
...
... the extension MUST maintain the accuracy of the record by sending an
updated full copy of the record to iana@iana.org with the subject
line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only
the 'Comments', 'Contact_Email ...
... %%
Figure 6: Format of Records in the Language Tag Extensions Registry
...
... truncated subtags.
When a language tag is to be used in a specific, known, protocol, it
is RECOMMENDED that the language tag not contain extensions not
...
... When a language tag is to be used in a specific, known, protocol, it
is RECOMMENDED that the language tag not contain extensions not
supported by that protocol. In addition, note that some protocols
MAY impose upper limits on the length of the strings used to store or
...
... MAY impose upper limits on the length of the strings used to store or
transport the language tag.
...
... Registry containing the various subtags initially valid in a
language tag is necessary. This collection of subtags, along with a
description of the process used to create it, is described by
...
... [RFC3066] when this document is adopted MAY be completed under the
former rules, at the discretion of the Language Tag Reviewer (as
described in [RFC3066]). Until the IESG ...
... IESG officially appoints a
Language Subtag Reviewer, the existing Language Tag Reviewer SHALL
serve as the Language Subtag Reviewer.
...
...
An initial version of the Language Tag Extensions Registry described
in Section 3.7 is also needed. The Language Tag ...
... Language Tag Extensions Registry described
in Section 3.7 is also needed. The Language Tag Extensions Registry
SHALL be initialized with a single record containing a single field
...
... Formation and Processing of Language Tags ...
... Choice of Language Tag ...
...
Interoperability is best served when all users use the same language
tag in order to represent the same language. If an application has
requirements ...
... application risks damaging interoperability. It is strongly
RECOMMENDED that users not define their own rules for language tag
choice.
...
... Subtags SHOULD only be used where they add useful distinguishing
information; extraneous subtags interfere with the meaning,
understanding, and processing of language tags. In particular, users
and implementations SHOULD follow the 'Prefix' and 'Suppress-Script'
...
... registry (defined in Section 3.1): these fields provide
guidance on when specific additional subtags SHOULD (and SHOULD NOT)
be used in a language tag.
Of particular note, many applications can benefit from the use of
...
...
Of particular note, many applications can benefit from the use of
script subtags in language tags, as long as the use is consistent for
a given context. Script subtags were not formally defined in RFC
...
... registry, the Suppress-Script field helps ensure greater
compatibility between the language tags generated according to the
rules in this document and language tags and tag ...
... compatibility between the language tags generated according to the
rules in this document and language tags and tag processors or
...
... language and region
subtags and are reserved for future standardization. Applications
might benefit from their judicious use in forming language tags in
the future. Similar recommendations are expected to apply to their
use as apply to script subtags.
...
... here.
The choice of subtags used to form a language tag SHOULD be guided by
the following rules:
...
... precise for such a task.
2. The script subtag SHOULD NOT be used to form language tags unless
the script adds some distinguishing information to the tag. The
...
... registry
entry, then the value of that field SHOULD be used to form the
language tag in preference to the tag or subtag in which the
preferred value appears.
...
... used to label content, even if the language is unknown. Omitting
the language tag altogether is preferred to using a tag with a
primary language ...
... primary language subtag of 'und'. The 'und' subtag MAY be useful
for protocols that require a language tag to be provided. The
'und' subtag MAY also be useful when matching language tags in
...
... for protocols that require a language tag to be provided. The
'und' subtag MAY also be useful when matching language tags in
certain situations.
...
...
6. The same variant subtag SHOULD NOT be used more than once within
a language tag.
* For example, do not use "de-DE-1901-1901".
...
... backward compatibility, this document contains
several provisions to account for potential instability in the
standards used to define the subtags that make up language tags.
These provisions mean that no language tag created ...
... standards used to define the subtags that make up language tags.
These provisions mean that no language tag created under the rules in
this document will become obsolete.
...
... Meaning of the Language Tag ...
... section gives only possible examples of its usage.
o For a single information object, the associated language tags
might be interpreted as the set of languages that is necessary for
...
...
o For an aggregation of information objects, the associated language
tags could be taken as the set of languages used inside components
of that aggregation ...
...
o For information objects whose purpose is to provide alternatives,
the associated language tags could be regarded as a hint that the
content is provided in several languages ...
... inappropriate Norwegian rules.
Language tags are related when they contain a similar sequence of
subtags. For example, if a language tag B contains language tag ...
... Language tags are related when they contain a similar sequence of
subtags. For example, if a language tag B contains language tag A as
a prefix ...
... Language tags are related when they contain a similar sequence of
subtags. For example, if a language tag B contains language tag A as
a prefix, then B is typically "narrower" or "more specific" than A.
...
...
[RFC3066] did not provide an upper limit on the size of language
tags. While RFC 3066(-> 4647 | 4646) did define the semantics of particular subtags
...
... 3066(-> 4647 | 4646) did define the semantics of particular subtags
in such a way that most language tags consisted of language and
region subtags with a combined total length of up to six characters,
...
... registered.
Neither the language tag syntax nor other requirements in this
document impose a fixed upper limit on the number of subtags in a
...
... requirements in this
document impose a fixed upper limit on the number of subtags in a
language tag (and thus an upper bound on the size of a tag). The
language tag ...
... language tag (and thus an upper bound on the size of a tag). The
language tag syntax suggests that, depending on the specific
language, more subtags (and thus a longer tag ...
...
Some applications and protocols are forced to allocate fixed buffer
sizes or otherwise limit the length of a language tag. A conformant
implementation or specification MAY refuse to support the storage of
language tags ...
... language tag. A conformant
implementation or specification MAY refuse to support the storage of
language tags that exceed a specified length. Any such limitation
SHOULD be clearly documented, and such documentation SHOULD include
what happens to longer tags ...
... tags (for example, whether an error value is
generated or the language tag is truncated). A protocol that allows
tags to be truncated at an arbitrary limit, without giving any
...
... tags in substantial ways.
In practice, most language tags do not require more than a few
subtags and will not approach reasonably sized buffer limitations;
...
... have a fixed length limitation. For example, [RFC2231] has no
explicit length limitation: the length available for the language tag
is constrained by the length of other header components (such as the
...
... buffer limit are:
Implementations SHOULD NOT truncate language tags unless the
meaning of the tag is purposefully being changed, or unless the
...
... Protocols or specifications that specify limited buffer sizes for
language tags MUST allow for language tags of up to 33 characters.
...
... buffer sizes for
language tags MUST allow for language tags of up to 33 characters.
Protocols or specifications that specify limited buffer sizes ...
... Protocols or specifications that specify limited buffer sizes for
language tags SHOULD allow for language tags of at least 42
characters.
...
... Truncation of Language Tags ...
...
Truncation of a language tag alters the meaning of the tag, and thus
SHOULD be avoided. However, truncation of language tags ...
... language tag alters the meaning of the tag, and thus
SHOULD be avoided. However, truncation of language tags is sometimes
necessary due to limited buffer sizes. Such truncation MUST NOT
...
... so by progressively removing subtags along with their preceding "-"
from the right side of the language tag until the tag is short enough
for the given buffer ...
... Canonicalization of Language Tags ...
...
Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created ...
... Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical
form.
...
... singleton subtag.
Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical
form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in
...
... canonical form.
Example: The language tag "en-BU" (English as used in Burma) is not
canonical ...
...
Canonicalization of language tags does not imply anything about the
use of upper or lowercase letters when processing or comparing
subtags (and as described in Section 2.1). All comparisons MUST be
...
...
When performing canonicalization of language tags, processors MAY
regularize the case of the subtags (that is, this process is
...
... Implementers SHOULD specify a locale-neutral casing operation to
ensure that case folding of subtags does not produce this value,
which is illegal in language tags. For example, if one were to
uppercase the region subtag 'in' using Turkish locale rules, the
sequence U+0130 U+004E would result instead of the expected 'IN ...
... tags that include these values, although the values are
canonical when they appear in a language tag.
An extension MUST define any relationships that exist between the
...
... no meaning outside the private agreement between the parties that
intend to use or exchange language tags that employ them. The same
subtags MAY be used with a different meaning under a separate private
agreement ...
... semantic meaning of private
use tags and of the subtags used within such a language tag are not
defined by this document.
...
... ISO 3166 private use codes) MAY
be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a
great deal of public, interchangeable information about the language ...
...
The new registry MUST be listed under "Language Tags" at
<http://www.iana.org/numbers.html>, replacing the existing
registrations ...
... forms and RFC 3066(-> 4647 | 4646) registrations MUST be relabeled as "Language Tags
(Obsolete)" and maintained (but not added to or modified).
...
...
Future work by IANA on the Language Tag Extensions Registry is
limited to two cases. First, the IESG ...
...
Language tags used in content negotiation, like any other information
exchanged on the Internet ...
... defenses).
The language tag associated with a particular information item is of
no consequence whatsoever in determining whether that content might
contain possible homographs. The fact that a text is tagged as being
...
... language or using a particular script subtag provides no
assurance whatsoever that it does not contain characters from scripts
other than the one(s) associated with or specified by that language
tag.
Since there is no limit to the number of variant, private use ...
... buffer overflow
attacks. See Section 4.3 for details on language tag truncation,
which can occur as a consequence of defenses against buffer overflow.
...
...
The syntax in this document requires that language tags use only the
characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
character sets ...
... characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
character sets, so the composition of language tags should not have
any character set issues.
...
... character set issues.
Rendering of characters based on the content of a language tag is not
addressed in this memo. Historically, some languages have relied on
...
... applies to language- and culture-specific variations of Han
ideographs as used in Japanese, Chinese, and Korean). When language
tags are applied to spans of text, rendering engines sometimes use
that information in deciding which font to use in the absence of
other information, particularly where languages ...
...
The main goals for this revision of language tags were the following:
*Compatibility ...
... *Compatibility.* All RFC 3066(-> 4647 | 4646) language tags (including those in the
IANA registry) remain valid ...
... valid in this specification. The changes in
this document represent additional constraints on language tags.
That is, in no case is the syntax more permissive and processors
...
... XMLSchema]) will be able to process the tags described
by this document. In addition, this document defines language tags
in such as way as to ensure future compatibility.
...
... standards, a valid RFC 3066(-> 4647 | 4646) language tag could become invalid or have
its meaning change. This has the potential of invalidating content
that may have an extensive shelf-life. In this specification, once a
...
... its meaning change. This has the potential of invalidating content
that may have an extensive shelf-life. In this specification, once a
language tag is valid, it remains valid forever.
...
...
*Validity.* The structure of language tags defined by this document
makes it possible to determine if a particular tag is well-formed ...
... registry with specific
versioning information, the validity of language tags at any point in
time can be precisely determined (instead of interpolating values
from many separate sources).
...
... without resorting to the registration process. The addition of UN
M.49 codes provides for the generation of language tags with regional
scope, which is also required by some applications.
...
...
The recast of the registry from containing whole language tags to
subtags is a key part of this. An important feature of RFC 3066(-> 4647 | 4646) was
...
... subtags.
*Extensibility.* Because of the widespread use of language tags, it
is disruptive to have periodic revisions of the core specification,
even in the face of demonstrated need. The extension mechanism
...
... even in the face of demonstrated need. The extension mechanism
provides for a way for independent RFCs to define extensions to
language tags. These extensions have a very constrained, well-
defined structure that prevents extensions from interfering with
implementations of language tags ...
... language tags. These extensions have a very constrained, well-
defined structure that prevents extensions from interfering with
implementations of language tags defined in this document.
The document also anticipates features of ISO ...
... language subtags, as well as the possibility of other
ISO 639 parts becoming useful for the formation of language tags in
the future.
...
... registry becomes the canonical
source for forming language tags.
o Provides a process that guarantees stability of language tags ...
... language tags.
o Provides a process that guarantees stability of language tags, by
handling reuse of values by ISO 639, ISO ...
... method for indicating in the registry
when script subtags are necessary for a given language tag.
o Adds the concept of a variant subtag and allows variants to be
...
... Phillips, A., Ed. and M. Davis, Ed., "Matching of Language Tags", BCP 47, RFC 4647, September 2006. ...
... 1766(-> 3282draft | 3066(-> 4647 | 4646)), the precursors of this
document, made enormous contributions directly or indirectly to this
document and are generally responsible for the success of language
tags.
The following people (in alphabetical order) contributed to this
...
... originated RFCs 1766 and 3066, and without whom this document would
not have been possible. Special thanks must go to Michael Everson,
who has served as Language Tag Reviewer for almost the complete
period since the publication of RFC 1766(-> 3282draft | 3066(-> 4647 | 4646)). Special thanks to Doug
...
... Ewell, for his production of the first complete subtag registry, and
his work in producing a test parser for verifying language tags.
...
... Appendix B. Examples of Language Tags (Informative) ...
