The purpose of the Content-Type field is to describe the data
contained in the body fully enough that the receiving user agent can
pick an appropriate agent or mechanism to present the data to the
user, or otherwise deal with the data in an appropriate manner.
HISTORICAL NOTE: The Content-Type header field was first defined in
RFC 1049hist. RFC 1049hist Content-types used a simpler and less powerful
syntax, but one that is largely compatible with the mechanism given
here.
The Content-Type header field is used to specify the nature of the
data in the body of an entity, by giving type and subtype
identifiers, and by providing auxiliary information that may be
required for certain types. After the type and subtype names, the
remainder of the header field is simply a set of parameters,
specified in an attribute/value notation. The set of meaningful
parameters differs for the different types. In particular, there are
NO globally-meaningful parameters that apply to all content-types.
Global mechanisms are best addressed, in the MIME model, by the
definition of additional Content-* header fields. The ordering of
parameters is not significant. Among the defined parameters is a
"charset" parameter by which the character set used in the body may
be declared. Comments are allowed in accordance with RFC 822std11(-> 2822prop) rules
for structured header fields.
In general, the top-level Content-Type is used to declare the general
type of data, while the subtype specifies a specific format for that
type of data. Thus, a Content-Type of "image/xyz" is enough to tell
a user agent that the data is an image, even if the user agent has no
knowledge of the specific image format "xyz". Such information can
be used, for example, to decide whether or not to show a user the raw
data from an unrecognized subtype -- such an action might be
reasonable for unrecognized subtypes of text, but not for
unrecognized subtypes of image or audio. For this reason, registered
subtypes of audio, image, text, and video, should not contain
embedded information that is really of a different type. Such
compound types should be represented using the "multipart" or
"application" types.
Parameters are modifiers of the content-subtype, and do not
fundamentally affect the requirements of the host system. Although
most parameters make sense only with certain content-types, others
are "global" in the sense that they might apply to any subtype. For
example, the "boundary" parameter makes sense only for the
"multipart" content-type, but the "charset" parameter might make
sense with several content-types.
An initial set of seven Content-Types is defined by this document.
This set of top-level names is intended to be substantially complete.
It is expected that additions to the larger set of supported types
can generally be accomplished by the creation of new subtypes of
these initial types. In the future, more top-level types may be
defined only by an extension to this standard. If another primary
type is to be used for any reason, it must be given a name starting
with "X-" to indicate its non-standard status and to avoid a
potential conflict with a future official name.
In the Augmented BNF notation of RFC 822std11(-> 2822prop), a Content-Type header field
value is defined as follows:
content := "Content-Type" ":" type "/" subtype *(";"
parameter)
; case-insensitive matching of type and subtype
type := "application" / "audio"
/ "image" / "message"
/ "multipart" / "text"
/ "video" / extension-token
; All values case-insensitive
extension-token := x-token / iana-token
iana-token := <a publicly-defined extension token,
registered with IANA, as specified in
appendix E>
x-token := <The two characters "X-" or "x-" followed, with
no intervening white space, by any token>
subtype := token ; case-insensitive
parameter := attribute "=" value
attribute := token ; case-insensitive
value := token / quoted-string
token := 1*<any (ASCII) CHAR except SPACE, CTLs,
or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@"
/ "," / ";" / ":" / "\" / <">
/ "/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
Note that the definition of "tspecials" is the same as the RFC 822std11(-> 2822prop)
definition of "specials" with the addition of the three characters
"/", "?", and "=", and the removal of ".".
Note also that a subtype specification is MANDATORY. There are no
default subtypes.
The type, subtype, and parameter names are not case sensitive. For
example, TEXT, Text, and TeXt are all equivalent. Parameter values
are normally case sensitive, but certain parameters are interpreted
to be case-insensitive, depending on the intended use. (For example,
multipart boundaries are case-sensitive, but the "access-type" for
message/External-body is not case-sensitive.)
Beyond this syntax, the only constraint on the definition of subtype
names is the desire that their uses must not conflict. That is, it
would be undesirable to have two different communities using
"Content-Type: application/foobar" to mean two different things. The
process of defining new content-subtypes, then, is not intended to be
a mechanism for imposing restrictions, but simply a mechanism for
publicizing the usages. There are, therefore, two acceptable
mechanisms for defining new Content-Type subtypes:
1. Private values (starting with "X-") may be
defined bilaterally between two cooperating
agents without outside registration or
standardization.
2. New standard values must be documented,
registered with, and approved by IANA, as
described in Appendix E. Where intended for
public use, the formats they refer to must
also be defined by a published specification,
and possibly offered for standardization.
The seven standard initial predefined Content-Types are detailed in
the bulk of this document. They are:
text -- textual information. The primary subtype,
"plain", indicates plain (unformatted) text. No
special software is required to get the full
meaning of the text, aside from support for the
indicated character set. Subtypes are to be used
for enriched text in forms where application
software may enhance the appearance of the text,
but such software must not be required in order to
get the general idea of the content. Possible
subtypes thus include any readable word processor
format. A very simple and portable subtype,
richtext, was defined in RFC 1341(-> 1521(-> 2049draft | 2048(-> 4289 | 4288) | 2047draft | 2046draft | 2045draft)), with a future
revision expected.
multipart -- data consisting of multiple parts of
independent data types. Four initial subtypes
are defined, including the primary "mixed"
subtype, "alternative" for representing the same
data in multiple formats, "parallel" for parts
intended to be viewed simultaneously, and "digest"
for multipart entities in which each part is of
type "message".
message -- an encapsulated message. A body of
Content-Type "message" is itself all or part of a
fully formatted RFC 822std11(-> 2822prop) conformant message which
may contain its own different Content-Type header
field. The primary subtype is "rfc822". The
"partial" subtype is defined for partial messages,
to permit the fragmented transmission of bodies
that are thought to be too large to be passed
through mail transport facilities. Another
subtype, "External-body", is defined for
specifying large bodies by reference to an
external data source.
image -- image data. Image requires a display device
(such as a graphical display, a printer, or a FAX
machine) to view the information. Initial
subtypes are defined for two widely-used image
formats, jpeg and gif.
audio -- audio data, with initial subtype "basic".
Audio requires an audio output device (such as a
speaker or a telephone) to "display" the contents.
video -- video data. Video requires the capability to
display moving images, typically including
specialized hardware and software. The initial
subtype is "mpeg".
application -- some other kind of data, typically
either uninterpreted binary data or information to
be processed by a mail-based application. The
primary subtype, "octet-stream", is to be used in
the case of uninterpreted binary data, in which
case the simplest recommended action is to offer
to write the information into a file for the user.
An additional subtype, "PostScript", is defined
for transporting PostScript documents in bodies.
Other expected uses for "application" include
spreadsheets, data for mail-based scheduling
systems, and languages for "active"
(computational) email. (Note that active email
and other application data may entail several
security considerations, which are discussed later
in this memo, particularly in the context of
application/PostScript.)
Default RFC 822std11(-> 2822prop) messages are typed by this protocol as plain text in
the US-ASCII character set, which can be explicitly specified as
"Content-type: text/plain; charset=us-ascii". If no Content-Type is
specified, this default is assumed. In the presence of a MIME-
Version header field, a receiving User Agent can also assume that
plain US-ASCII text was the sender's intent. In the absence of a
MIME-Version specification, plain US-ASCII text must still be
assumed, but the sender's intent might have been otherwise.
RATIONALE: In the absence of any Content-Type header field or
MIME-Version header field, it is impossible to be certain that a
message is actually text in the US-ASCII character set, since it
might well be a message that, using the conventions that predate
this document, includes text in another character set or non-
textual data in a manner that cannot be automatically recognized
(e.g., a uuencoded compressed UNIX tar file). Although there is
no fully acceptable alternative to treating such untyped messages
as "text/plain; charset=us-ascii", implementors should remain
aware that if a message lacks both the MIME-Version and the
Content-Type header fields, it may in practice contain almost
anything.
It should be noted that the list of Content-Type values given here
may be augmented in time, via the mechanisms described above, and
that the set of subtypes is expected to grow substantially.
When a mail reader encounters mail with an unknown Content-type
value, it should generally treat it as equivalent to
"application/octet-stream", as described later in this document.