1. Introduction
This document defines support for the header-free and interleaved/
bundled packet formats for the EVRC-B codec, a new compact bundled
format for the EVRC and EVRC-B codecs, as well as discontinuous
transmission (DTX) support for EVRC and EVRC-B-encoded speech
transported via RTP. Voice over IP (VoIP) applications operating
over low bandwidth dial-up and wireless networks require such EVRC
RTP payload capabilities for efficient use of the bandwidth.
EVRC-B [3] is an extension to EVRC [2] developed in the Third
Generation Partnership Project 2 (3GPP2). EVRC-B [3] compresses each
20 milliseconds of 8000Hz, 16-bit sampled speech input into output
frames of one of the four different sizes: Rate 1 (171 bits), Rate
1/2 (80 bits), Rate 1/4 (40 bits), or Rate 1/8 (16 bits). In
addition, there are two zero-bit codec frame types: null frames and
erasure frames, similar to EVRC [2]. One significant enhancement in
EVRC-B is the use of 1/4-rate frames that were not used in EVRC.
This provides lower average data rates (ADRs) compared to EVRC, for a
given voice quality.
Since speech frames encoded by EVRC-B are different from those
encoded by EVRC, EVRC-B and EVRC codecs do not interoperate with each
other. At the initiation of an RTP session, the RTP sender and
receiver need to indicate (e.g., using MIME subtypes that are
separate from those of EVRC) that EVRC-B is to be used for the
ensuing session.
1.2. Compact (Header-free) Bundled Format
The current interleaved/bundled packet format defined in RFC 3558prop
allows bundling of multiple speech frames of different rates in a
single RTP packet, sending mode change requests, and interleaving.
To support these functions, a Table of Contents (ToC) is used in each
RTP packet, in addition to the standard RTP header. The size of the
ToC varies depending on the number of EVRC frames carried in the
packet [4].
The current header-free packet format defined in RFC 3558prop is more
compact and optimized for use over wireless links. It eliminates the
need for a ToC by requiring that each RTP packet contain only one
speech frame (of any allowable rate), i.e., bundling is not allowed.
Moreover, interleaving and mode change requests are not supported in
the header-free format [4].
The compact bundled format described in this document presents the
user an alternative to the header-free format defined in RFC 3558prop.
This format allows bundling of multiple EVRC or EVRC-B frames without
the addition of extra headers, as would be in the case of the
interleaved/bundled format. However, in order to use this compact
bundled format, only one EVRC/EVRC-B rate (full rate or 1/2 rate) can
be used in the session. Similar to the header-free format defined in
RFC 3558prop, interleaving and mode change requests are not supported in
the compact bundled format.
Information carried in frames of EVRC and EVRC-B codecs varies little
during periods of silence. The transmission of these frames across
the radio interface in a wireless system is expensive, in terms of
capacity; therefore, suppression of these frames is desirable. Such
an operation is called DTX, also known as silence suppression.
In general, when DTX/silence suppression is applied, the first few
frames of silence may be transmitted at the beginning of the period
of silence to establish background noise. Then, a portion of the
stream of subsequent silence frames is not transmitted, and is
discarded at the sender. At the receiver, background or comfort
noise may be generated by using the previously received silence
frames.
The full detail of DTX/silence suppression operation can be found in
DTX [8] as well as in RFC 3551std65 [9], and in RFC 3558prop [4]. This
document only defines the additional optional MIME parameters
(silencesupp, dtxmax, dtxmin, and hangover) for setting up a DTX/
silence suppression session, where "silencesupp" is for indicating
the capability and willingness of using DTX/silence suppression;
"dtxmax" and "dtxmin", for indicating the desired range of DTX update
interval; and "hangover", for indicating the desired number of
silence frames at the beginning of each silence period to establish
background noise at the receiver (see Section 6.1 for detailed
definition).
The EVRC and EVRC-B codecs, in variable-rate operation mode, send
1/8-rate frames during periods of silence, while in single-rate
operation mode (see Section 4), silence is encoded and sent in frames
of the same rate as that of speech frames. The DTX parameters
defined in this document apply to 1/8-rate frames in the variable-
rate mode and to silence frames in the single-rate operation mode.
For simplicity, in the rest of this document the term "silence frame"
refers either to an 1/8-rate frame in variable-rate operation or a
frame that contains only silence in the signal-rate operation.