[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-spittka-payload-rtp-opus) 00
01 02 03 04 05 06 07 08 09 10 11 RFC 7587

Network Working Group                                         J. Spittka

Internet-Draft

Intended status: Standards Track                                  K. Vos

Expires: January 31, 2015                                        vocTone

                                                               JM. Valin

                                                                 Mozilla

                                                           July 30, 2014

RTP Payload Format for Opus Speech and Audio Codec

draft-ietf-payload-rtp-opus-03


Abstract

   This document defines the Real-time Transport Protocol (RTP) payload

   format for packetization of Opus encoded speech and audio data

   necessary to integrate the codec in the most compatible way.

   Further, it describes media type registrations for the RTP payload

   format.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the

   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering

   Task Force (IETF).  Note that other groups may also distribute

   working documents as Internet-Drafts.  The list of current Internet-

   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months

   and may be updated, replaced, or obsoleted by other documents at any

   time.  It is inappropriate to use Internet-Drafts as reference

   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 31, 2015.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the

   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal

   Provisions Relating to IETF Documents

   (http://trustee.ietf.org/license-info) in effect on the date of

   publication of this document.  Please review these documents

   carefully, as they describe your rights and restrictions with respect

   to this document.  Code Components extracted from this document must

Spittka, et al.         Expires January 31, 2015                [Page 1]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   include Simplified BSD License text as described in Section 4.e of

   the Trust Legal Provisions and are provided without warranty as

   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2

   2.  Conventions, Definitions and Acronyms used in this document .   3

     2.1.  Audio Bandwidth . . . . . . . . . . . . . . . . . . . . .   3

   3.  Opus Codec  . . . . . . . . . . . . . . . . . . . . . . . . .   3

     3.1.  Network Bandwidth . . . . . . . . . . . . . . . . . . . .   4

       3.1.1.  Recommended Bitrate . . . . . . . . . . . . . . . . .   4

       3.1.2.  Variable versus Constant Bitrate  . . . . . . . . . .   4

       3.1.3.  Discontinuous Transmission (DTX)  . . . . . . . . . .   4

     3.2.  Complexity  . . . . . . . . . . . . . . . . . . . . . . .   5

     3.3.  Forward Error Correction (FEC)  . . . . . . . . . . . . .   5

     3.4.  Stereo Operation  . . . . . . . . . . . . . . . . . . . .   6

   4.  Opus RTP Payload Format . . . . . . . . . . . . . . . . . . .   6

     4.1.  RTP Header Usage  . . . . . . . . . . . . . . . . . . . .   6

     4.2.  Payload Structure . . . . . . . . . . . . . . . . . . . .   7

   5.  Congestion Control  . . . . . . . . . . . . . . . . . . . . .   8

   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8

     6.1.  Opus Media Type Registration  . . . . . . . . . . . . . .   9

     6.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .  12

       6.2.1.  Offer-Answer Model Considerations for Opus  . . . . .  14

       6.2.2.  Declarative SDP Considerations for Opus . . . . . . .  15

   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  16

   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  16

   9.  Normative References  . . . . . . . . . . . . . . . . . . . .  16

   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

1. Introduction


   The Opus codec is a speech and audio codec developed within the IETF

   Internet Wideband Audio Codec working group.  The codec has a very

   low algorithmic delay and it is highly scalable in terms of audio

   bandwidth, bitrate, and complexity.  Further, it provides different

   modes to efficiently encode speech signals as well as music signals,

   thus making it the codec of choice for various applications using the

   Internet or similar networks.

   This document defines the Real-time Transport Protocol (RTP)

   [RFC3550] payload format for packetization of Opus encoded speech and

   audio data necessary to integrate the Opus codec in the most

   compatible way.  Further, it describes media type registrations for

   the RTP payload format.  More information on the Opus codec can be

   obtained from [RFC6716].

Spittka, et al.         Expires January 31, 2015                [Page 2]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

2. Conventions, Definitions and Acronyms used in this document


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

   document are to be interpreted as described in [RFC2119].

   CBR:  Constant bitrate

   CPU:  Central Processing Unit

   DTX:  Discontinuous transmission

   FEC:  Forward error correction

   IP:  Internet Protocol

   samples:  Speech or audio samples (per channel)

   SDP:  Session Description Protocol

   VBR:  Variable bitrate

2.1. Audio Bandwidth


   Throughout this document, we refer to the following definitions:

   +--------------+----------------+-----------------+-----------------+

   | Abbreviation |      Name      | Audio Bandwidth |  Sampling Rate  |

   |              |                |       (Hz)      |       (Hz)      |

   +--------------+----------------+-----------------+-----------------+

   |      NB      |   Narrowband   |     0 - 4000    |       8000      |

   |              |                |                 |                 |

   |      MB      |   Mediumband   |     0 - 6000    |      12000      |

   |              |                |                 |                 |

   |      WB      |    Wideband    |     0 - 8000    |      16000      |

   |              |                |                 |                 |

   |     SWB      | Super-wideband |    0 - 12000    |      24000      |

   |              |                |                 |                 |

   |      FB      |    Fullband    |    0 - 20000    |      48000      |

   +--------------+----------------+-----------------+-----------------+

                          Audio bandwidth naming

                                  Table 1

3. Opus Codec


   The Opus [RFC6716] codec encodes speech signals as well as general

   audio signals.  Two different modes can be chosen, a voice mode or an

   audio mode, to allow the most efficient coding depending on the type

   of the input signal, the sampling frequency of the input signal, and

   the intended application.

Spittka, et al.         Expires January 31, 2015                [Page 3]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   The voice mode allows efficient encoding of voice signals at lower

   bit rates while the audio mode is optimized for general audio signals

   at medium and higher bitrates.

   The Opus speech and audio codec is highly scalable in terms of audio

   bandwidth, bitrate, and complexity.  Further, Opus allows

   transmitting stereo signals.

3.1. Network Bandwidth


   Opus supports all bitrates from 6 kb/s to 510 kb/s.  The bitrate can

   be changed dynamically within that range.  All other parameters being

   equal, higher bitrates result in higher quality.

3.1.1. Recommended Bitrate


   For a frame size of 20 ms, these are the bitrate "sweet spots" for

   Opus in various configurations:

   o  8-12 kb/s for NB speech,

   o  16-20 kb/s for WB speech,

   o  28-40 kb/s for FB speech,

   o  48-64 kb/s for FB mono music, and

   o  64-128 kb/s for FB stereo music.

3.1.2. Variable versus Constant Bitrate


   For the same average bitrate, variable bitrate (VBR) can achieve

   higher quality than constant bitrate (CBR).  For the majority of

   voice transmission applications, VBR is the best choice.  One reason

   for choosing CBR is the potential information leak that _might_ occur

   when encrypting the compressed stream.  See [RFC6562] for guidelines

   on when VBR is appropriate for encrypted audio communications.  In

   the case where an existing VBR stream needs to be converted to CBR

   for security reasons, then the Opus padding mechanism described in

   [RFC6716] is the RECOMMENDED way to achieve padding because the RTP

   padding bit is unencrypted.

   The bitrate can be adjusted at any point in time.  To avoid

   congestion, the average bitrate SHOULD NOT exceed the available

   network capacity.  If no target bitrate is specified, the bitrates

   specified in Section 3.1.1 are RECOMMENDED.

3.1.3. Discontinuous Transmission (DTX)


   The Opus codec can, as described in Section 3.1.2, be operated with a

   variable bitrate.  In that case, the encoder will automatically

   reduce the bitrate for certain input signals, like periods of

Spittka, et al.         Expires January 31, 2015                [Page 4]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   silence.  When using continuous transmission, it will reduce the

   bitrate when the characteristics of the input signal permit, but will

   never interrupt the transmission to the receiver.  Therefore, the

   received signal will maintain the same high level of quality over the

   full duration of a transmission while minimizing the average bit rate

   over time.

   In cases where the bitrate of Opus needs to be reduced even further

   or in cases where only constant bitrate is available, the Opus

   encoder can use discontinuous transmission (DTX), where parts of the

   encoded signal that correspond to periods of silence in the input

   speech or audio signal are not transmitted to the receiver.  A

   receiver can distinguish between DTX and packet loss by looking for

   gaps in the sequence number, as described by Section 4.1

   of [RFC3551].

   On the receiving side, the non-transmitted parts will be handled by a

   frame loss concealment unit in the Opus decoder which generates a

   comfort noise signal to replace the non transmitted parts of the

   speech or audio signal.  Use of [RFC3389] Comfort Noise (CN) with

   Opus is discouraged.  The transmitter MUST drop whole frames only,

   based on the size of the last transmitted frame, to ensure successive

   RTP timestamps differ by a multiple of 120 and to allow the receiver

   to use whole frames for concealment.

   DTX can be used with both variable and constant bitrate.  It will

   have a slightly lower speech or audio quality than continuous

   transmission.  Therefore, using continuous transmission is

   RECOMMENDED unless restraints on network capacity are severe.

3.2. Complexity


   Complexity can be scaled to optimize for CPU resources in real-time,

   mostly as a trade-off between audio quality and bitrate.  Also,

   different modes of Opus have different complexity.

3.3. Forward Error Correction (FEC)


   The voice mode of Opus allows for embedding "in-band" forward error

   correction (FEC) data into the Opus bit stream.  This FEC scheme adds

   redundant information about the previous packet (N-1) to the current

   output packet N.  For each frame, the encoder decides whether to use

   FEC based on (1) an externally-provided estimate of the channel's

   packet loss rate; (2) an externally-provided estimate of the

   channel's capacity; (3) the sensitivity of the audio or speech signal

   to packet loss; (4) whether the receiving decoder has indicated it

   can take advantage of "in-band" FEC information.  The decision to

   send "in-band" FEC information is entirely controlled by the encoder

Spittka, et al.         Expires January 31, 2015                [Page 5]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   and therefore no special precautions for the payload have to be

   taken.

   On the receiving side, the decoder can take advantage of this

   additional information when it loses a packet and the next packet is

   available.  In order to use the FEC data, the jitter buffer needs to

   provide access to payloads with the FEC data.  The receiver can then

   configure its decoder to decode the FEC data from the packet rather

   than the regular audio data.  If no FEC data is available for the

   current frame, the decoder will consider the frame lost and invoke

   frame loss concealment.

   If the FEC scheme is not implemented on the receiving side, FEC

   SHOULD NOT be used, as it leads to an inefficient usage of network

   resources.  Decoder support for FEC SHOULD be indicated at the time a

   session is set up.

3.4. Stereo Operation


   Opus allows for transmission of stereo audio signals.  This operation

   is signaled in-band in the Opus payload and no special arrangement is

   needed in the payload format.  Any implementation of the Opus decoder

   MUST be capable of receiving stereo signals, although it MAY decode

   those signals as mono.

   If a decoder can not take advantage of the benefits of a stereo

   signal this SHOULD be indicated at the time a session is set up.  In

   that case the sending side SHOULD NOT send stereo signals as it leads

   to an inefficient usage of network resources.

4. Opus RTP Payload Format


   The payload format for Opus consists of the RTP header and Opus

   payload data.

4.1. RTP Header Usage


   The format of the RTP header is specified in [RFC3550].  The use of

   the fields of the RTP header by the Opus payload format is consistent

   with that specification.

   The payload length of Opus is an integer number of octets and

   therefore no padding is necessary.  The payload MAY be padded by an

   integer number of octets according to [RFC3550].

   The timestamp, sequence number, and marker bit (M) of the RTP header

   are used in accordance with Section 4.1 of [RFC3551].

Spittka, et al.         Expires January 31, 2015                [Page 6]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   The RTP payload type for Opus has not been assigned statically and is

   expected to be assigned dynamically.

   The receiving side MUST be prepared to receive duplicate RTP packets.

   The receiver MUST provide at most one of those payloads to the Opus

   decoder for decoding, and MUST discard the others.

   Opus supports 5 different audio bandwidths, which can be adjusted

   during a call.  The RTP timestamp is incremented with a 48000 Hz

   clock rate for all modes of Opus and all sampling rates.  The unit

   for the timestamp is samples per single (mono) channel.  The RTP

   timestamp corresponds to the sample time of the first encoded sample

   in the encoded frame.  For data encoded with sampling rates other

   than 48000 Hz, the sampling rate has to be adjusted to 48000 Hz using

   the corresponding multiplier in Table 2.

                    +--------------------+------------+

                    | Sampling Rate (Hz) | Multiplier |

                    +--------------------+------------+

                    |        8000        |     6      |

                    |                    |            |

                    |       12000        |     4      |

                    |                    |            |

                    |       16000        |     3      |

                    |                    |            |

                    |       24000        |     2      |

                    |                    |            |

                    |       48000        |     1      |

                    +--------------------+------------+

                       Table 2: Timestamp multiplier

4.2. Payload Structure


   The Opus encoder can output encoded frames representing 2.5, 5, 10,

   20, 40, or 60 ms of speech or audio data.  Further, an arbitrary

   number of frames can be combined into a packet, up to a maximum

   packet duration representing 120 ms of speech or audio data.  The

   grouping of one or more Opus frames into a single Opus packet is

   defined in Section 3 of [RFC6716].  An RTP payload MUST contain

   exactly one Opus packet as defined by that document.

   Figure 1 shows the structure combined with the RTP header.

Spittka, et al.         Expires January 31, 2015                [Page 7]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

                        +----------+--------------+

                        |RTP Header| Opus Payload |

                        +----------+--------------+

                Figure 1: Payload Structure with RTP header

   Table 3 shows supported frame sizes in milliseconds of encoded speech

   or audio data for the speech and audio modes (Mode) and sampling

   rates (fs) of Opus and shows how the timestamp is incremented for

   packetization (ts incr).  If the Opus encoder outputs multiple

   encoded frames into a single packet, the timestamp increment is the

   sum of the increments for the individual frames.

    +---------+-----------------+-----+-----+-----+-----+------+------+

    |   Mode  |        fs       | 2.5 |  5  |  10 |  20 |  40  |  60  |

    +---------+-----------------+-----+-----+-----+-----+------+------+

    | ts incr |       all       | 120 | 240 | 480 | 960 | 1920 | 2880 |

    |         |                 |     |     |     |     |      |      |

    |  voice  | NB/MB/WB/SWB/FB |     |     |  x  |  x  |  x   |  x   |

    |         |                 |     |     |     |     |      |      |

    |  audio  |   NB/WB/SWB/FB  |  x  |  x  |  x  |  x  |      |      |

    +---------+-----------------+-----+-----+-----+-----+------+------+

       Table 3: Supported Opus frame sizes and timestamp increments

5. Congestion Control


   The target bitrate of Opus can be adjusted at any point in time, thus

   allowing efficient congestion control.  Furthermore, the amount of

   encoded speech or audio data encoded in a single packet can be used

   for congestion control, since the transmission rate is inversely

   proportional to the packet duration.  A lower packet transmission

   rate reduces the amount of header overhead, but at the same time

   increases latency and loss sensitivity, so it ought to be used with

   care.

   It is RECOMMENDED that senders of Opus encoded data apply congestion

   control.

6. IANA Considerations


   One media subtype (audio/opus) has been defined and registered as

   described in the following section.

Spittka, et al.         Expires January 31, 2015                [Page 8]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

6.1. Opus Media Type Registration


   Media type registration is done according to [RFC4288] and [RFC4855].

   Type name: audio

   Subtype name: opus

   Required parameters:

   rate:  the RTP timestamp is incremented with a 48000 Hz clock rate

      for all modes of Opus and all sampling rates.  For data encoded

      with sampling rates other than 48000 Hz, the sampling rate has to

      be adjusted to 48000 Hz using the corresponding multiplier in

      Table 2.

   Optional parameters:

   maxplaybackrate:  a hint about the maximum output sampling rate that

      the receiver is capable of rendering in Hz.  The decoder MUST be

      capable of decoding any audio bandwidth but due to hardware

      limitations only signals up to the specified sampling rate can be

      played back.  Sending signals with higher audio bandwidth results

      in higher than necessary network usage and encoding complexity, so

      an encoder SHOULD NOT encode frequencies above the audio bandwidth

      specified by maxplaybackrate.  This parameter can take any value

      between 8000 and 48000, although commonly the value will match one

      of the Opus bandwidths (Table 1).  By default, the receiver is

      assumed to have no limitations, i.e. 48000.

   sprop-maxcapturerate:  a hint about the maximum input sampling rate

      that the sender is likely to produce.  This is not a guarantee

      that the sender will never send any higher bandwidth (e.g. it

      could send a pre-recorded prompt that uses a higher bandwidth),

      but it indicates to the receiver that frequencies above this

      maximum can safely be discarded.  This parameter is useful to

      avoid wasting receiver resources by operating the audio processing

      pipeline (e.g. echo cancellation) at a higher rate than necessary.

      This parameter can take any value between 8000 and 48000, although

      commonly the value will match one of the Opus bandwidths

      (Table 1).  By default, the sender is assumed to have no

      limitations, i.e. 48000.

Spittka, et al.         Expires January 31, 2015                [Page 9]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   maxptime:  the maximum duration of media represented by a packet

      (according to Section 6 of [RFC4566]) that a decoder wants to

      receive, in milliseconds rounded up to the next full integer

      value.  Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

      multiple of an Opus frame size rounded up to the next full integer

      value, up to a maximum value of 120, as defined in Section 4.  If

      no value is specified, the default is 120.  This value is a

      recommendation by the decoding side to ensure the best performance

      for the decoder.  The decoder MUST be capable of accepting any

      allowed packet sizes to ensure maximum compatibility.

   ptime:  the preferred duration of media represented by a packet

      (according to Section 6 of [RFC4566]) that a decoder wants to

      receive, in milliseconds rounded up to the next full integer

      value.  Possible values are 3, 5, 10, 20, 40, 60, or an arbitrary

      multiple of an Opus frame size rounded up to the next full integer

      value, up to a maximum value of 120, as defined in Section 4.  If

      no value is specified, the default is 20.  If ptime is greater

      than maxptime, ptime MUST be ignored.  This parameter MAY be

      changed during a session.  This value is a recommendation by the

      decoding side to ensure the best performance for the decoder.  The

      decoder MUST be capable of accepting any allowed packet sizes to

      ensure maximum compatibility.

   minptime:  the minimum duration of media represented by a packet

      (according to Section 6 of [RFC4566]) that SHOULD be encapsulated

      in a received packet, in milliseconds rounded up to the next full

      integer value.  Possible values are 3, 5, 10, 20, 40, and 60 or an

      arbitrary multiple of Opus frame sizes rounded up to the next full

      integer value up to a maximum value of 120 as defined in

      Section 4.  If no value is specified, the default is 3.  This

      value is a recommendation by the decoding side to ensure the best

      performance for the decoder.  The decoder MUST be capable to

      accept any allowed packet sizes to ensure maximum compatibility.

   maxaveragebitrate:  specifies the maximum average receive bitrate of

      a session in bits per second (b/s).  The actual value of the

      bitrate can vary, as it is dependent on the characteristics of the

      media in a packet.  Note that the maximum average bitrate MAY be

      modified dynamically during a session.  Any positive integer is

      allowed, but values outside the range 6000 to 510000 SHOULD be

      ignored.  If no value is specified, the maximum value specified in

      Section 3.1.1 for the corresponding mode of Opus and corresponding

      maxplaybackrate is the default.

   stereo:  specifies whether the decoder prefers receiving stereo or

      mono signals.  Possible values are 1 and 0 where 1 specifies that

      stereo signals are preferred, and 0 specifies that only mono

Spittka, et al.         Expires January 31, 2015               [Page 10]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

      signals are preferred.  Independent of the stereo parameter every

      receiver MUST be able to receive and decode stereo signals but

      sending stereo signals to a receiver that signaled a preference

      for mono signals may result in higher than necessary network

      utilization and encoding complexity.  If no value is specified,

      the default is 0 (mono).

   sprop-stereo:  specifies whether the sender is likely to produce

      stereo audio.  Possible values are 1 and 0, where 1 specifies that

      stereo signals are likely to be sent, and 0 specifies that the

      sender will likely only send mono.  This is not a guarantee that

      the sender will never send stereo audio (e.g. it could send a pre-

      recorded prompt that uses stereo), but it indicates to the

      receiver that the received signal can be safely downmixed to mono.

      This parameter is useful to avoid wasting receiver resources by

      operating the audio processing pipeline (e.g. echo cancellation)

      in stereo when not necessary.  If no value is specified, the

      default is 0 (mono).

   cbr:  specifies if the decoder prefers the use of a constant bitrate

      versus variable bitrate.  Possible values are 1 and 0, where 1

      specifies constant bitrate and 0 specifies variable bitrate.  If

      no value is specified, the default is 0 (vbr).  When cbr is 1, the

      maximum average bitrate can still change, e.g. to adapt to

      changing network conditions.

   useinbandfec:  specifies that the decoder has the capability to take

      advantage of the Opus in-band FEC.  Possible values are 1 and 0.

      Providing 0 when FEC cannot be used on the receiving side is

      RECOMMENDED.  If no value is specified, useinbandfec is assumed to

      be 0.  This parameter is only a preference and the receiver MUST

      be able to process packets that include FEC information, even if

      it means the FEC part is discarded.

   usedtx:  specifies if the decoder prefers the use of DTX.  Possible

      values are 1 and 0.  If no value is specified, the default is 0.

   Encoding considerations:

      The Opus media type is framed and consists of binary data

      according to Section 4.8 in [RFC4288].

   Security considerations:

Spittka, et al.         Expires January 31, 2015               [Page 11]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

      See Section 7 of this document.

   Interoperability considerations: none

   Published specification: none

   Applications that use this media type:

      Any application that requires the transport of speech or audio

      data can use this media type.  Some examples are, but not limited

      to, audio and video conferencing, Voice over IP, media streaming.

   Person & email address to contact for further information:

      SILK Support silksupport@skype.net

      Jean-Marc Valin jmvalin@jmvalin.ca

   Intended usage: COMMON

   Restrictions on usage:

      For transfer over RTP, the RTP payload format (Section 4 of this

      document) SHALL be used.

   Author:

      Julian Spittka jspittka@gmail.com

      Koen Vos koenvos74@gmail.com

      Jean-Marc Valin jmvalin@jmvalin.ca

   Change controller: TBD

6.2. Mapping to SDP Parameters


   The information described in the media type specification has a

   specific mapping to fields in the Session Description Protocol (SDP)

   [RFC4566], which is commonly used to describe RTP sessions.  When SDP

   is used to specify sessions employing the Opus codec, the mapping is

   as follows:

   o  The media type ("audio") goes in SDP "m=" as the media name.

Spittka, et al.         Expires January 31, 2015               [Page 12]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   o  The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding

      name.  The RTP clock rate in "a=rtpmap" MUST be 48000 and the

      number of channels MUST be 2.

   o  The OPTIONAL media type parameters "ptime" and "maxptime" are

      mapped to "a=ptime" and "a=maxptime" attributes, respectively, in

      the SDP.

   o  The OPTIONAL media type parameters "maxaveragebitrate",

      "maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec",

      and "usedtx", when present, MUST be included in the "a=fmtp"

      attribute in the SDP, expressed as a media type string in the form

      of a semicolon-separated list of parameter=value pairs (e.g.,

      maxaveragebitrate=20000).  They MUST NOT be specified in an SSRC-

      specific "fmtp" source-level attribute (as defined in Section 6.3

      of [RFC5576]).

   o  The OPTIONAL media type parameters "sprop-maxcapturerate", and

      "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by

      copying them directly from the media type parameter string as part

      of the semicolon-separated list of parameter=value pairs (e.g.,

      sprop-stereo=1).  These same OPTIONAL media type parameters MAY

      also be specified using an SSRC-specific "fmtp" source-level

      attribute as described in Section 6.3 of [RFC5576].  They MAY be

      specified in both places, in which case the parameter in the

      source-level attribute overrides the one found on the "a=fmtp"

      line.  The value of any parameter which is not specified in a

      source-level source attribute MUST be taken from the "a=fmtp"

      line, if it is present there.

   Below are some examples of SDP session descriptions for Opus:

   Example 1: Standard mono session with 48000 Hz clock rate

       m=audio 54312 RTP/AVP 101

       a=rtpmap:101 opus/48000/2

   Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,

   recommended packet size of 40 ms, maximum average bitrate of 20000

   bps, prefers to receive stereo but only plans to send mono, FEC is

   desired, DTX is not desired

Spittka, et al.         Expires January 31, 2015               [Page 13]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

       m=audio 54312 RTP/AVP 101

       a=rtpmap:101 opus/48000/2

       a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000;

       maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0

       a=ptime:40

       a=maxptime:40

   Example 3: Two-way full-band stereo preferred

       m=audio 54312 RTP/AVP 101

       a=rtpmap:101 opus/48000/2

       a=fmtp:101 stereo=1; sprop-stereo=1

6.2.1. Offer-Answer Model Considerations for Opus


   When using the offer-answer procedure described in [RFC3264] to

   negotiate the use of Opus, the following considerations apply:

   o  Opus supports several clock rates.  For signaling purposes only

      the highest, i.e. 48000, is used.  The actual clock rate of the

      corresponding media is signaled inside the payload and is not

      restricted by this payload format description.  The decoder MUST

      be capable of decoding every received clock rate.  An example is

      shown below:

       m=audio 54312 RTP/AVP 100

       a=rtpmap:100 opus/48000/2

   o  The "ptime" and "maxptime" parameters are unidirectional receive-

      only parameters and typically will not compromise

      interoperability; however, some values might cause application

      performance to suffer.  [RFC3264] defines the SDP offer-answer

      handling of the "ptime" parameter.  The "maxptime" parameter MUST

      be handled in the same way.

   o  The "minptime" parameter is a unidirectional receive-only

      parameters and typically will not compromise interoperability;

      however, some values might cause application performance to suffer

      and ought to be used with care.

   o  The "maxplaybackrate" parameter is a unidirectional receive-only

      parameter that reflects limitations of the local receiver.  When

      sending to a single destination, a sender MUST NOT use an audio

      bandwidth higher than necessary to make full use of audio sampled

      at a sampling rate of "maxplaybackrate".  Gateways or senders that

      are sending the same encoded audio to multiple destinations SHOULD

Spittka, et al.         Expires January 31, 2015               [Page 14]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

      NOT use an audio bandwidth higher than necessary to represent

      audio sampled at "maxplaybackrate", as this would lead to

      inefficient use of network resources.  The "maxplaybackrate"

      parameter does not affect interoperability.  Also, this parameter

      SHOULD NOT be used to adjust the audio bandwidth as a function of

      the bitrate, as this is the responsibility of the Opus encoder

      implementation.

   o  The "maxaveragebitrate" parameter is a unidirectional receive-only

      parameter that reflects limitations of the local receiver.  The

      sender of the other side MUST NOT send with an average bitrate

      higher than "maxaveragebitrate" as it might overload the network

      and/or receiver.  The "maxaveragebitrate" parameter typically will

      not compromise interoperability; however, some values might cause

      application performance to suffer, and ought to be set with care.

   o  The "sprop-maxcapturerate" and "sprop-stereo" parameters are

      unidirectional sender-only parameters that reflect limitations of

      the sender side.  They allow the receiver to set up a reduced-

      complexity audio processing pipeline if the sender is not planning

      to use the full range of Opus's capabilities.  Neither "sprop-

      maxcapturerate" nor "sprop-stereo" affect interoperability and the

      receiver MUST be capable of receiving any signal.

   o  The "stereo" parameter is a unidirectional receive-only parameter.

      When sending to a single destination, a sender MUST NOT use stereo

      when "stereo" is 0.  Gateways or senders that are sending the same

      encoded audio to multiple destinations SHOULD NOT use stereo when

      "stereo" is 0, as this would lead to inefficient use of network

      resources.  The "stereo" parameter does not affect

      interoperability.

   o  The "cbr" parameter is a unidirectional receive-only parameter.

   o  The "useinbandfec" parameter is a unidirectional receive-only

      parameter.

   o  The "usedtx" parameter is a unidirectional receive-only parameter.

   o  Any unknown parameter in an offer MUST be ignored by the receiver

      and MUST be removed from the answer.

6.2.2. Declarative SDP Considerations for Opus


   For declarative use of SDP such as in Session Announcement Protocol

   (SAP), [RFC2974], and RTSP, [RFC2326], for Opus, the following needs

   to be considered:

   o  The values for "maxptime", "ptime", "minptime", "maxplaybackrate",

      and "maxaveragebitrate" ought to be selected carefully to ensure

      that a reasonable performance can be achieved for the participants

      of a session.

   o  The values for "maxptime", "ptime", and "minptime" of the payload

      format configuration are recommendations by the decoding side to

      ensure the best performance for the decoder.  The decoder MUST be

Spittka, et al.         Expires January 31, 2015               [Page 15]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

      capable of accepting any allowed packet sizes to ensure maximum

      compatibility.

   o  All other parameters of the payload format configuration are

      declarative and a participant MUST use the configurations that are

      provided for the session.  More than one configuration can be

      provided if necessary by declaring multiple RTP payload types;

      however, the number of types ought to be kept small.

7. Security Considerations


   All RTP packets using the payload format defined in this

   specification are subject to the general security considerations

   discussed in the RTP specification [RFC3550] and any profile from,

   e.g., [RFC3711] or [RFC3551].

   This payload format transports Opus encoded speech or audio data.

   Hence, security issues include confidentiality, integrity protection,

   and authentication of the speech or audio itself.  The Opus payload

   format does not have any built-in security mechanisms.  Any suitable

   external mechanisms, such as SRTP [RFC3711], MAY be used.

   This payload format and the Opus encoding do not exhibit any

   significant non-uniformity in the receiver-end computational load and

   thus are unlikely to pose a denial-of-service threat due to the

   receipt of pathological datagrams.

8. Acknowledgements

TBD

9. Normative References


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate

              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time

              Streaming Protocol (RTSP)", RFC 2326, April 1998.

   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session

              Announcement Protocol", RFC 2974, October 2000.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model

              with Session Description Protocol (SDP)", RFC 3264, June

              2002.

   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for

              Comfort Noise (CN)", RFC 3389, September 2002.

Spittka, et al.         Expires January 31, 2015               [Page 16]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.

              Jacobson, "RTP: A Transport Protocol for Real-Time

              Applications", STD 64, RFC 3550, July 2003.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and

              Video Conferences with Minimal Control", STD 65, RFC 3551,

              July 2003.

   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.

              Norrman, "The Secure Real-time Transport Protocol (SRTP)",

              RFC 3711, March 2004.

   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and

              Registration Procedures", RFC 4288, December 2005.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session

              Description Protocol", RFC 4566, July 2006.

   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload

              Formats", RFC 4855, February 2007.

   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific

              Media Attributes in the Session Description Protocol

              (SDP)", RFC 5576, June 2009.

   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of

              Variable Bit Rate Audio with Secure RTP", RFC 6562, March

              2012.

   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the

              Opus Audio Codec", RFC 6716, September 2012.

Authors' Addresses

   Julian Spittka

   Email: jspittka@gmail.com

   Koen Vos

   vocTone

   Email: koenvos74@gmail.com

Spittka, et al.         Expires January 31, 2015               [Page 17]

Internet-Draft      RTP Payload Format for Opus Codec          July 2014

   Jean-Marc Valin

   Mozilla

   331 E. Evelyn Avenue

   Mountain View, CA  94041

   USA

   Email: jmvalin@jmvalin.ca

Spittka, et al.         Expires January 31, 2015               [Page 18]

Html markup produced by rfcmarkup 1.121, available from https://tools.ietf.org/tools/rfcmarkup/

RTP Payload Format for Opus Speech and Audio Codec的更多相关文章

RTP Payload Format for Transport of MPEG-4 Elementary Streams over http
1．SDP (1)Http Request GET /getSdpForUrl?HttpUrl=nphMpeg4/g726-640x480 HTTP/1.0/r/n Host: 58.63.71.90 ...
RTP Payload Format for H264 Video
基础传输结构 rtp中对于h264数据的存储分为两层,分别是 VCL: video coding layer 视频编码层这是h264中block, macro block 以及 slice级别的定义 ...
RTP Payload Format for VP8 Video
整体结构 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+ ...
RFC3984: RTP Payload Format for H.264 Video(中文版)
转载地址:https://blog.csdn.net/h514434485/article/details/51010950 官方文档,中文版本地址:http://www.rosoo.net/File ...
H.264 RTP PAYLOAD 格式
H.264 视频 RTP 负载格式 1. 网络抽象层单元类型 (NALU) NALU 头由一个字节组成, 它的语法如下: +---------------+ |0|1|2|3|4|5|6|7 ...
多媒体开发之---h264中的RTP PAYLOAD 格式
H.264 视频 RTP 负载格式 1. 网络抽象层单元类型 (NALU) NALU 头由一个字节组成, 它的语法如下: +---------------+ |0|1|2|3|4|5|6|7 ...
用实例分析H264 RTP payload
用实例分析H264 RTP payload H264的RTP中有三种不同的基本负载(Single NAL,Non-interleaved,Interleaved) 应用程序可以使用第一个字节来识别. ...
wireshark 获取RTP payload
wireshark 抓包获取RTP TS流数据,保存为TS文件首先解析RTP流 2.点击菜单栏[Statistics]-[RTP]-[Show All Streams] 3.在Wireshark:R ...
RTP 有效负载(载荷)类型，RTP Payload Type
转自:http://blog.csdn.net/caoshangpa/article/details/53008018 版权声明:本文为灿哥哥http://blog.csdn.net/caoshang ...

随机推荐

Nginx (简体中文)
博文地址:https://wiki.archlinux.org/index.php/Nginx_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)#.E5.AE.89.E8. ...
1065. [Nescafe19] 绿豆蛙的归宿（概率）
1065. [Nescafe19] 绿豆蛙的归宿 ★ 输入文件:ldfrog.in 输出文件:ldfrog.out 简单对比时间限制:1 s 内存限制:128 MB [背景] 随着新版 ...
3.设计模式----TemplateMethod模式
模板模式,其实是一种思想,在开发中有很多地方用到模板,因为毕竟我们不可能每一个都一出一段!一个模板,填充不同,出来效果也是不一样! 准备画个时序图的,没找到工具,过几天补上! 模板模式在出现bug时候 ...
threading.local的作用？
threading.local()这个方法的特点用来保存一个全局变量,但是这个全局变量只有在当前线程才能访问,如果你在开发多线程应用的时候需要每个线程保存一个单独的数据供当前线程操作,可以考虑使用 ...
dig指定服务器查询域名解析时间
time=$(dig @8.8.8.8 baidu.com | grep Query | awk '{print $4}') echo $time 一 nslookup指定服务器查询域名解析时间 ro ...
STM32L0 复位和时钟控制 Reset and clock control (RCC)
时钟源: HSE:外部时钟 HSI16:可以直接用于系统时钟或者作为PLL输入.一般是1%精度 HSI48:The HSI48 clock signal is generated from an in ...
python基础21 ------python基础之socket编程
一.C/S架构和B/S架构的简介略二.osi七层模型略三.socket层 1.如图所示: socket层是存在于应用层和传输层直接抽象出来的一层. 2.socket层是什么? Socket是应 ...
python基础14 ---函数模块4(configparser模块)
configparser模块一.configparser模块 1.什么是configparser模块:configparser模块操作配置文件,配置文件的格式与windows ini和linux的c ...
zend 和 esftp插件开发大型PHP项目，ZEND最常用快捷键
先说一下如何安装zend的esftp插件,下载插件esftp-1.1.1.zip,下载地址http://sourceforge.net/projects/esftp/ 或者 http://yun.ba ...
关于Unicode转为str的方法
unicode_a=u'\u810f\u4e71' str_a=unicode_a.encode('unicode-escape').decode('string_escape')

RTP Payload Format for Opus Speech and Audio Codec

RTP Payload Format for Opus Speech and Audio Codec

draft-ietf-payload-rtp-opus-03

1. Introduction

2. Conventions, Definitions and Acronyms used in this document

2.1. Audio Bandwidth

3. Opus Codec

3.1. Network Bandwidth

3.1.1. Recommended Bitrate

3.1.2. Variable versus Constant Bitrate

3.1.3. Discontinuous Transmission (DTX)

3.2. Complexity

3.3. Forward Error Correction (FEC)

3.4. Stereo Operation

4. Opus RTP Payload Format

4.1. RTP Header Usage

4.2. Payload Structure

5. Congestion Control

6. IANA Considerations

6.1. Opus Media Type Registration

6.2. Mapping to SDP Parameters

6.2.1. Offer-Answer Model Considerations for Opus

6.2.2. Declarative SDP Considerations for Opus

7. Security Considerations

8. Acknowledgements

9. Normative References

RTP Payload Format for Opus Speech and Audio Codec的更多相关文章

随机推荐

热门专题