【文章内容属于多方转载内容】

PCM Parameters

PCM audio is coded using a combination of various parameters.

Resolution/Sample Size

This parameter specifies the amount of data used to represent each discrete amplitude sample. The most common values are 8 bits (1 byte), which gives a range of 256 amplitude steps, or 16 bits (2 bytes), which gives a range of 65536 amplitude steps. Other sizes, such as 12, 20, and 24 bits, are occasionally seen. Some king-sized formats even opt for 32 and 64 bits per sample.

Byte Order

When more than one byte is used to represent a PCM sample, the byte order (big endian vs. little endian) must be known. Due to the widespread use of little-endian Intel CPUs, little-endian PCM tends to be the most common byte orientation.

Sign

It is not enough to know that a PCM sample is, for example, 8 bits wide. Whether the sample is signed or unsigned is needed to understand the range. If the sample is unsigned, the sample range is 0..255 with a centerpoint of 128. If the sample is signed, the sample range is -128..127 with a centerpoint of 0. If a PCM type is signed, the sign encoding is almost always 2's complement. In very rare cases, signed PCM audio is represented as a series of sign/magnitude coded numbers.

Channels And Interleaving

If the PCM type is monaural, each sample will belong to that one channel. If there is more than one channel, the channels will almost always be interleaved: Left sample, right sample, left, right, etc., in the case of stereo interleaved data. In some rare cases, usually when optimized for special playback hardware, chunks of audio destined for different channels will not be interleaved.

Frequency And Sample Rate

This parameter measures how many samples/channel are played each second. Frequency is measured in samples/second (Hz). Common frequency values include 8000, 11025, 16000, 22050, 32000, 44100, and 48000 Hz.

Integer Or Floating Point

Most PCM formats encode samples using integers. However, some applications which demand higher precision will store and process PCM samples using floating point numbers.

Floating-point PCM samples (32- or 64-bit in size) are zero-centred and varies in the interval [-1.0, 1.0], thus signed values.

PCM Types

Linear PCM

The most common PCM type.

Logarithmic PCM

Rather than representing sample amplitudes on a linear scale as linear PCM coding does, logarithmic PCM coding plots the amplitudes on a logarithmic scale. Log PCM is more often used in telephony and communications applications than in entertainment multimedia applications.

There are two major variants of log PCM: mu-law (u-law) and A-law. Mu-law coding uses the format number 0x07 in Microsoft multimedia files (WAV/AVI/ASF) and the fourcc 'ulaw' in Apple Quicktime files. A-law coding uses the format number 0x06 is Microsoft multimedia files and the fourcc 'alaw' in Apple Quicktime files.

Every byte of a log PCM data chunk maps to a signed 16-bit linear PCM sample. [TODO: Add either the conversion tables or conversion formulas]

Differential PCM

Values are encoded as differences between the current and the previous value. This reduces the number of bits required per audio sample by about 25% compared to PCM.

Adaptive DPCM

The size of the quantization step is varied to allow further reduction of the required bandwidth for a given signal-to-noise ratio.

Platform-Specific PCM Identifiers And Characteristics

This section describes how different computing platforms store PCM audio data and any format identifiers they use.

DOS/Windows

The first widely available, PC audio card that could play back PCM audio was the Creative Labs' Sound Blaster. This drove the audio format for a lot of early audio-capable DOS applications and games. The original Sound Blaster could only play mono, unsigned 8-bit PCM data. Later Sound Blaster cards were capable of playing back 16-bit audio data. However, while these cards still played unsigned 8-bit PCM data, 16-bit data needed be signed.

Likely owing to the DOS/Intel little endian architecture, 16-bit PCM for the Sound Blaster also needs to be little endian.

Further, the original Sound Blaster was somewhat limited in the frequencies that it could support. The digital to analog conversion hardware (DAC) had to be programmed with a byte value (frequency divisor) that was processed through the following formula to yield the final playback frequency:

  1. frequency = 1000000 / (256 - frequency_divisor)

A common divisor is 211 which yields an integer frequency of 22222 Hz, a common rate in the days of the Sound Blaster. Note that while very low frequencies (all the way down to 3921 Hz) were supported, frequencies above 45454 Hz were not.

Microsoft WAV/AVI/ASF Identifiers

Microsoft multimedia file formats such as WAVAVI, and ASF all share the WAVEFORMATEX data structure. The structure defines, among other properties, a 16-bit little endian audio identifier. The following audio identifiers correspond to various PCM formats:

  • 0x0001 denotes linear PCM
  • 0x0006 denotes A-law logarithmic PCM
  • 0x0007 denotes mu-law logarithmic PCM

Apple Macintosh

Native sample rates of early Apple Macintosh audio hardware included 11127 Hz and 22254 Hz. These sample rates are commonly seen in early QuickTime files.

Apple QuickTime Identifiers

Audio information in QuickTime files is stored along with an stsd atom that contains a FOURCC to indicate the format type. Apple QuickTime accomodates a number of different PCM formats:

  • 'raw ' (need space character, ASCII 0x20, to round out FOURCC) denotes unsigned, linear PCM. 16-bit data is stored in little endian format.
  • 'twos' denotes signed (i.e. twos-complement) linear PCM. 16-bit data is stored in big endian format.
  • 'sowt' ('twos' spelled backwards) also denotes signed linear PCM. However, 16-bit data is stored in little endian format.
  • 'in24' denotes 24-bit, big endian, linear PCM.
  • 'in32' denotes 32-bit, big endian, linear PCM.
  • 'fl32' denotes 32-bit floating point PCM. (Presumably IEEE 32-bit; byte order?)
  • 'fl64' denotes 64-bit floating point PCM. (Presumably IEEE 64-bit; byte order?)
  • 'alaw' denotes A-law logarithmic PCM.
  • 'ulaw' denotes mu-law logarithmic PCM.

Red Book CD Audio

The "Red Book" defines the format of a standard audio compact disc (CD). The audio data on a standard CD consists of 16-bit linear PCM samples stored in little endian format, replayed at 44100 Hz (hence the standard term "CD-quality audio"), with left-right stereo interleaving.

Sega CD

Games made for the Sega CD, an add-on for the Sega Genesis game console, all seem to use sign-magnitude coding to store PCM information. It is a good guess that the Sega CD unit has custom hardware to play this format natively.

Sega Saturn

Games made for the Sega Saturn video game console generally seem to store PCM data as signed, 8-bit data or signed, big endian, 16-bit data. The curious property of the PCM, however, is the stereo handling. Generally, multimedia files on Sega Saturn games (most often stored using the Sega FILM format) would store a block of left channel information followed by a block of right channel information rather than interleaving left and right samples. This is likely due to custom multi-channel audio hardware in which individual channels are assigned pan positions. For playing stereo data, one channel is assigned extreme left and another is assigned extreme right. The correct samples are sent to their respective channels. Interleaved data would require deinterleaving before playback.

DVD PCM

Standard Video-DVDs can contain 16-bit, 20-bit and 24-bit signed, linear PCM (often called LPCM) streams. A stream can consist of up to 8 channels as long as the maximum bandwidth of 6.144 mbit/sec for any LPCM audio stream is not exceeded. Two samplerates are supported: 48kHz and 96kHz.

pcm音频的格式类型的更多相关文章

  1. 视音频数据处理入门:PCM音频采样数据处理

    ===================================================== 视音频数据处理入门系列文章: 视音频数据处理入门:RGB.YUV像素数据处理 视音频数据处理 ...

  2. Android OpenSL ES 开发:OpenSL ES利用SoundTouch实现PCM音频的变速和变调

    缘由 OpenSL ES 学习到现在已经知道 OpenSL ES 不仅能播放和录制PCM音频数据,还能改变声音大小.设置左声道或右声道播放.还能变速播放,可谓是播放音频的王者.但是变速有一点不好的就是 ...

  3. Windows PCM音频捕获与播放实现

    在WINDOWS下,音频函数有多种类型,如MCI.多媒体OLE控制.高级音频等,使用方法都比较简单.但如果想编写一个功能较强大的音频处理程序,那就必须使用低级音频函数和多媒体文件I/O来控制音频设备的 ...

  4. FFmpeg基础库编程开发学习笔记——音频常见格式及字幕格式

    声明一下:这些关于ffmpeg的文章仅仅是用于记录我的学习历程和以便于以后查阅,文章中的一些文字可能是直接摘自于其它文章.书籍或者文献,学习ffmpeg相关知识是为了使用在Android上,我也才是刚 ...

  5. Android OpenSL ES 开发:Android OpenSL 录制 PCM 音频数据

    一.实现说明 OpenSL ES的录音要比播放简单一些,在创建好引擎后,再创建好录音接口基本就可以录音了.在这里我们做的是流式录音,所以需要用至少2个buffer来缓存录制好的PCM数据,这里我们可以 ...

  6. iOS音频学习笔记一:常见音频封装格式及编码格式

    (1) pcm格式    pcm是经过话筒录音后直接得到的未经压缩的数据流    数据大小=采样频率*采样位数*声道*秒数/8     采样频率一般是22k或者44k,位数一般是8位或者16位,声道一 ...

  7. JavaCV FFmpeg采集麦克风PCM音频数据

    前阵子用一个JavaCV的FFmpeg库实现了YUV视频数据地采集,同样的采集PCM音频数据也可以采用JavaCV的FFmpeg库. 传送门:JavaCV FFmpeg采集摄像头YUV数据 首先引入 ...

  8. application/x-www-form-urlencoded multipart/form-data text/plain 后台返回的数据响应的格式类型

    application/x-www-form-urlencoded multipart/form-data text/plain 为什么上传文件的表单里要加个属性 enctype  后台返回的数据响应 ...

  9. 使用AudioTrack播放PCM音频数据(android)

    众所周知,Android的MediaPlayer包含了Audio和video的播放功能,在Android的界面上,Music和Video两个应用程序都是调用MediaPlayer实现的.MediaPl ...

随机推荐

  1. SQL ----------- join (inner join 内连接)

    SQL JOIN 子句用于把来自两个或多个表的行结合起来,基于这些表之间的共同字段,把两个表中的数据放在一个表中查询 注意: join 连接有多种方式,比如内连接,外连接,交叉连接 可以和where ...

  2. 管理ceph缓存池

    目录 缓存池简介 缓存池原理 缓存池的工作模式 配置缓存池 1. 创建一个缓存池 2. 设置缓存层 3. 缓存层相关参数说明 4. 测试缓存池 删除缓存池 1. 删除read-only缓存池 2. 删 ...

  3. SuRF : Practical Range Query Filtering with Fast Succinct Tries

    1. Introduction 在数据库管理系统中查找某些关键字会导致很大的磁盘I/O开销,针对这一问题,通常会使用一个内存开销小并且常驻内存的过滤器来检测该关键字是否存.比如现在常用的bloom过滤 ...

  4. 关于多个版本的jquery冲突的问题

    关于多个版本的jquery冲突的问题 先加载新的版本jquery 然后使用no confi代码,直接上代码看效果 <script src="https://libs.baidu.com ...

  5. 猫狗识别——PyTorch

    猫狗识别 数据集下载: 网盘链接:https://pan.baidu.com/s/1SlNAPf3NbgPyf93XluM7Fg 提取密码:hpn4 1. 要导入的包 import os import ...

  6. Newtonsoft.Json使用技巧

    本篇将为大家介绍Newtonsoft.Json的一些高级用法,可以修改很少的代码解决上述问题. 阅读目录 Newtonsoft.Json介绍 基本用法 高级用法 总结 回到顶部 Newtonsoft. ...

  7. 实测搭建jenkins多环境、多分支demo

    一.环境以及工具信息 1. 3台服务器信息 jenkins: 192.168.123.163.serverA:192.168.123.130.serverB :139.198.17.241三台机器都是 ...

  8. 2019 识装java面试笔试题 (含面试题解析)

      本人5年开发经验.18年年底开始跑路找工作,在互联网寒冬下成功拿到阿里巴巴.今日头条.识装等公司offer,岗位是Java后端开发,因为发展原因最终选择去了识装,入职一年时间了,也成为了面试官,之 ...

  9. 我用Bash编写了一个扫雷游戏

    我在编程教学方面不是专家,但当我想更好掌握某一样东西时,会试着找出让自己乐在其中的方法.比方说,当我想在 shell 编程方面更进一步时,我决定用 Bash 编写一个扫雷游戏来加以练习. 我在编程教学 ...

  10. MySQL语言分类——DML

    DML DML的全称是Database management Language,数据库管理语言.主要包括以下操作: insert.delete.update.optimize. 本篇对其逐一介绍 IN ...