RFC 4788:Enhancements to RTP Payload Formats for E...
RFC-Ref

1. Introduction


   This document defines support for the header-free and interleaved/
   bundled packet formats for the EVRC-B codec, a new compact bundled
   format for the EVRC and EVRC-B codecs, as well as discontinuous
   transmission (DTX) support for EVRC and EVRC-B-encoded speech
   transported via RTP.  Voice over IP (VoIP) applications operating
   over low bandwidth dial-up and wireless networks require such EVRC
   RTP payload capabilities for efficient use of the bandwidth.


1.1. Support of EVRC-B Codec


   EVRC-B [3] is an extension to EVRC [2] developed in the Third
   Generation Partnership Project 2 (3GPP2).  EVRC-B [3] compresses each
   20 milliseconds of 8000Hz, 16-bit sampled speech input into output
   frames of one of the four different sizes: Rate 1 (171 bits), Rate
   1/2 (80 bits), Rate 1/4 (40 bits), or Rate 1/8 (16 bits).  In
   addition, there are two zero-bit codec frame types: null frames and
   erasure frames, similar to EVRC [2].  One significant enhancement in
   EVRC-B is the use of 1/4-rate frames that were not used in EVRC.
   This provides lower average data rates (ADRs) compared to EVRC, for a
   given voice quality.

   Since speech frames encoded by EVRC-B are different from those
   encoded by EVRC, EVRC-B and EVRC codecs do not interoperate with each
   other.  At the initiation of an RTP session, the RTP sender and
   receiver need to indicate (e.g., using MIME subtypes that are
   separate from those of EVRC) that EVRC-B is to be used for the
   ensuing session.


1.2. Compact (Header-free) Bundled Format


   The current interleaved/bundled packet format defined in RFC 3558prop
   allows bundling of multiple speech frames of different rates in a
   single RTP packet, sending mode change requests, and interleaving.
   To support these functions, a Table of Contents (ToC) is used in each
   RTP packet, in addition to the standard RTP header.  The size of the
   ToC varies depending on the number of EVRC frames carried in the
   packet [4].

   The current header-free packet format defined in RFC 3558prop is more
   compact and optimized for use over wireless links.  It eliminates the
   need for a ToC by requiring that each RTP packet contain only one
   speech frame (of any allowable rate), i.e., bundling is not allowed.
   Moreover, interleaving and mode change requests are not supported in
   the header-free format [4].

   The compact bundled format described in this document presents the
   user an alternative to the header-free format defined in RFC 3558prop.
   This format allows bundling of multiple EVRC or EVRC-B frames without
   the addition of extra headers, as would be in the case of the
   interleaved/bundled format.  However, in order to use this compact
   bundled format, only one EVRC/EVRC-B rate (full rate or 1/2 rate) can
   be used in the session.  Similar to the header-free format defined in
   RFC 3558prop, interleaving and mode change requests are not supported in
   the compact bundled format.


1.3. Discontinuous Transmission (DTX)


   Information carried in frames of EVRC and EVRC-B codecs varies little
   during periods of silence.  The transmission of these frames across
   the radio interface in a wireless system is expensive, in terms of
   capacity; therefore, suppression of these frames is desirable.  Such
   an operation is called DTX, also known as silence suppression.

   In general, when DTX/silence suppression is applied, the first few
   frames of silence may be transmitted at the beginning of the period
   of silence to establish background noise.  Then, a portion of the
   stream of subsequent silence frames is not transmitted, and is
   discarded at the sender.  At the receiver, background or comfort
   noise may be generated by using the previously received silence
   frames.

   The full detail of DTX/silence suppression operation can be found in
   DTX [8] as well as in RFC 3551std65 [9], and in RFC 3558prop [4].  This
   document only defines the additional optional MIME parameters
   (silencesupp, dtxmax, dtxmin, and hangover) for setting up a DTX/
   silence suppression session, where "silencesupp" is for indicating
   the capability and willingness of using DTX/silence suppression;
   "dtxmax" and "dtxmin", for indicating the desired range of DTX update
   interval; and "hangover", for indicating the desired number of
   silence frames at the beginning of each silence period to establish
   background noise at the receiver (see Section 6.1 for detailed
   definition).

   The EVRC and EVRC-B codecs, in variable-rate operation mode, send
   1/8-rate frames during periods of silence, while in single-rate
   operation mode (see Section 4), silence is encoded and sent in frames
   of the same rate as that of speech frames.  The DTX parameters
   defined in this document apply to 1/8-rate frames in the variable-
   rate mode and to silence frames in the single-rate operation mode.

   For simplicity, in the rest of this document the term "silence frame"
   refers either to an 1/8-rate frame in variable-rate operation or a
   frame that contains only silence in the signal-rate operation.



Google
Web
RFC-Ref