多媒体封装格式----mkv

Matroska 开源多媒体容器标准。MKV属于其中的一部分。
Matroska常见的有.MKV视频格式、MKA音频格式、.MKS字幕格式、.MK3D files (stereoscopic/3D video).

1.EBML(Extensible Binary Meta Language)

MKV是建立在EBML这种语言的基础上，所以要了解MKV格式需要先了解EBML这种语言。

EBML是一种类似于XML格式的可扩展二进制元语言，使用可变长度的整数存储，以节省空间。

EBML基本元素结构：

typedef struct {
vint ID // EBML-ID
vint size // size of element
char[size] data // data
} EBML_ELEMENT;

ID标志属性类型

size为后面data部分的大小

data部分为ID所标识属性的实际数据

上面可以看到ID和size的类型都是vint,vint(Unsigned Integer Values of Variable Length)可变长度无符号整型，比传统32/64位整型更加节省空间。

长度计算方法为

长度 = 1 +整数前缀0比特的个数.

从MKV文件中简单接一段来举个例子。这是16进制表示方式

因为每个EBML元素都是由ID size data 三部分组成，我们就按照这些来分析。

将0x428 转成2进制为 01000010 按照上面规则前面有1个0 所以知道ID的长度为2，也就是0x4282为ID值。

将0x88 转成2进制为 10001000 1为开头长度就是1，去掉前缀1变成了00001000 ，也就是 size的值为 8.

接下来的8个字节就是data值：6D 61 74 72 6F 73 6B 61 根据上面ID值查表得知这个EMBL 名称为DocType 也就是说data的内容是string格式，所以转成askII码 data值就是“matroska” 和后面显示的一致。

所以这个EBML元素就解析出来了

ID=0x4282；

size=8；

data=“matroska” ；

得到的信息就是 DocType = matroska。

2.整体结构

让我们看看MKV的整体框架结构。

Level 0

Grouping

Level 1

Level 2

Level 3

EBML

Header

EBMLVersion

DocType

Segment

Meta Seek Information

SeekHead

Seek

SeekID

SeekPosition

Seek

SeekID

SeekPosition

Segment Information

Info

Title
SegmentUID

Track

Tracks

TrackEntry

Name

TrackNumber

TrackType

TrackEntry

Name

TrackNumber

TrackType

Chapters

Edition Entry

Clusters

Cluster

Timecode

BlockGroup

Block

BlockGroup

Block

ReferenceBlock

BlockGroup

Block

Cluster

Timecode

BlockGroup

Block

BlockGroup

Block

BlockGroup

Block

BlockGroup

Block

BlockDuration

Cueing Data

Cues

CuePoint

CueTime

CuePosition

CuePoint

CueTime

CuePosition

Attachment

Attachments

AttachedFile

FileName

FileData

AttachedFile

FileName

FileData

Tagging

1.EBML Header

MKV文件的开头部分是EBML header。可能会包括以下的内容

Element Name	L	EBML ID	Ma	Mu	Rng	Default	T	1	2	3	4	W	Description
EBML Header
EBML	0	[1A][45][DF][A3]	*	*	-	-	m	*	*	*	*	*	Set the EBML characteristics of the data to follow. Each EBML document has to start with this.
EBMLVersion	1	[42][86]	*	-	-	1	u	*	*	*	*	*	The version of EBML parser used to create the file.
EBMLReadVersion	1	[42][F7]	*	-	-	1	u	*	*	*	*	*	The minimum EBML version a parser has to support to read this file.
EBMLMaxIDLength	1	[42][F2]	*	-	-	4	u	*	*	*	*	*	The maximum length of the IDs you'll find in this file (4 or less in Matroska).
EBMLMaxSizeLength	1	[42][F3]	*	-	-	8	u	*	*	*	*	*	The maximum length of the sizes you'll find in this file (8 or less in Matroska). This does not override the element size indicated at the beginning of an element. Elements that have an indicated size which is larger than what is allowed by EBMLMaxSizeLength shall be considered invalid.
DocType	1	[42][82]	*	-	-	matroska	s	*	*	*	*	*	A string that describes the type of document that follows this EBML header. 'matroska' in our case or 'webm' for webm files.
DocTypeVersion	1	[42][87]	*	-	-	1	u	*	*	*	*	*	The version of DocType interpreter used to create the file.
DocTypeReadVersion	1	[42][85]	*	-	-	1	u	*	*	*	*	*	The minimum DocType version an interpreter has to support to read this file.

上面是官方文档，找个例子简单看一下。下面是一个mkv文件的开头截图

按照上文EBML格式来解析

1A = 00011010 长度3+1 ID = 1A 45 DF A3

93 = 10010011 长度1 size=19

data 就是之后19个字节内容

对照上面的表可以得知整个EBML header为红线圈住的内容

按照EBML规则进一步解析内容可得知

DocType = matroska

DocTypeVersion = 1

DocTypeReadVersion = 1

和官方工具对比发现完全一致

这里要说一下容器格式的可辨识性信息，一般的容器格式开头都有自己唯一的可辨识性信息，如flv文件开头三个字节'F' 'L' 'V' .这个可以让播放器来自动辨识容器格式。

MKV头部的1A 45 DF A3就是类似这种内容，如果发现开头是这4个字节，很有可能就是MKV文件。会继续进行解析。可以看看ffmpeg中matroska_probe

/* top-level master-IDs */
#define EBML_ID_HEADER 0x1A45DFA3
/*
* Autodetecting...
*/
static int matroska_probe(AVProbeData *p)
{
uint64_t total = 0;
int len_mask = 0x80, size = 1, n = 1, i;
/* EBML header? */
if (AV_RB32(p->buf) != EBML_ID_HEADER)
return 0;

2.Segment

MKV除了上面的EBML header，剩下的都属于Segment。里面包括了音视频信息、音视频数据等等。

还是用上面那个文件作为例子。用EBML方法继续解析。

ID = 18 53 80 67

size = 01 EC D8 D4 (32299220)

查表可知 ID = 18 53 80 67 对应的就是Segment

Element Name	L	EBML ID	Ma	Mu	Rng	Default	T	1	2	3	4	W	Description
Segment
Segment	0	[18][53][80][67]	*	*	-	-	m	*	*	*	*	*	This element contains all other top-level (level 1) elements. Typically a Matroska file is composed of 1 segment.

Segment中可能出现的是下面这些内容。

Meta Seek Information

Segment Information

Track

Chapters

Clusters

Cueing Data

Attachment

Tagging

1.Meta Seek Information

Meta Seek Information 其实是个快速索引的信息。他可能包含Track information, Chapters, Tags, Cues, Attachments, 这些部分的位置信息。这些位置是在Segment中的相对位置。Meta Seek这部分内容比不一定必须有，但是有了它能让你快速的定位到你关心的一些关键信息的位置，而不用按照文件顺序的解析。

Meta Seek Information
SeekHead	1	[11][4D][9B][74]	-	*	-	-	m	*	*	*	*	*	Contains the position of other level 1 elements.
Seek	2	[4D][BB]	*	*	-	-	m	*	*	*	*	*	Contains a single seek entry to an EBML element.
SeekID	3	[53][AB]	*	-	-	-	b	*	*	*	*	*	The binary ID corresponding to the element name.
SeekPosition	3	[53][AC]	*	-	-	-	u	*	*	*	*	*	The position of the element in the segment in octets (0 = first level 1 element).

上文提到过了EBML元素都有自己的级别，每一个高一级的元素由若干次一级的元素组成。下图就是meta seek 结构

Meta Seek Information

SeekHead

Seek

SeekID

SeekPosition

Seek

SeekID

SeekPosition

还是以上面的MKV文件为例子。Segment中第一部分很可能就是meta seek。

红线圈里就是seek 内同，按照规范中的表解析EBML元素

11 4D 9B 74 表示SeekHead

C0 解析后 size 64

接下来就是SeekHead 下一级元素

Seek

ID = 4D BB

size = 12

data = 53 AB 84 16 54 AE 6B 53 AC 82 10 03

data里面的信息就是seekID 和 SeekPosition

53 AB 表示seekID元素

size = 4

seekID = 15 49 A9 66

53 AC 表示SeekPosition元素

size = 2

SeekPosition = 10 03 (4099)

这个表示 ID 为 15 49 A9 66 的EBML元素所在Segment的位置为 4099处.查表可知道这个ID是

Segment Information 中的 info

这个4099 是Segment中的相对位置，把之前24字节的EBML header 和 12字节Segment的ID和size加上正好是4135，咱们找到文件4135(0x1027)的位置看看

正好是info 的位置。所以有了meta seek的信息我们可以快速找到一些关键EBML元素的位置。

把剩下meta seek信息同样方法解析出来就会等到如下信息：

2.Segment Information

Segment Information 包含识别文件的信息，包括 Title 、 SegmentUID,有个比较关心的文件时常信息Duration也在这一部分。

Element Name	L	EBML ID	Ma	Mu	Rng	Default	T	1	2	3	W	Description

Segment Information
Info	1	[15][49][A9][66]	*	*	-	-	m	*	*	*	*	Contains miscellaneous general information and statistics on the file.
SegmentUID	2	[73][A4]	-	-	not 0	-	b	*	*	*		A randomly generated unique ID to identify the current segment between many others (128 bits).
SegmentFilename	2	[73][84]	-	-	-	-	8	*	*	*		A filename corresponding to this segment.
PrevUID	2	[3C][B9][23]	-	-	-	-	b	*	*	*		A unique ID to identify the previous chained segment (128 bits).
PrevFilename	2	[3C][83][AB]	-	-	-	-	8	*	*	*		An escaped filename corresponding to the previous segment.
NextUID	2	[3E][B9][23]	-	-	-	-	b	*	*	*		A unique ID to identify the next chained segment (128 bits).
NextFilename	2	[3E][83][BB]	-	-	-	-	8	*	*	*		An escaped filename corresponding to the next segment.
SegmentFamily	2	[44][44]	-	*	-	-	b	*	*	*		A randomly generated unique ID that all segments related to each other must use (128 bits).
ChapterTranslate	2	[69][24]	-	*	-	-	m	*	*	*		A tuple of corresponding ID used by chapter codecs to represent this segment.
ChapterTranslateEditionUID	3	[69][FC]	-	*	-	-	u	*	*	*		Specify an edition UID on which this correspondance applies. When not specified, it means for all editions found in the segment.
ChapterTranslateCodec	3	[69][BF]	*	-	-	-	u	*	*	*		The chapter codec using this ID (0: Matroska Script, 1: DVD-menu).
ChapterTranslateID	3	[69][A5]	*	-	-	-	b	*	*	*		The binary value used to represent this segment in the chapter codec data. The format depends on theChapProcessCodecID used.
TimecodeScale	2	[2A][D7][B1]	*	-	-	1000000	u	*	*	*	*	Timecode scale in nanoseconds (1.000.000 means all timecodes in the segment are expressed in milliseconds). When combined with TimecodeScaleDenominator the Timecode scale is given by the fraction TimecodeScale/TimecodeScaleDenominator in seconds.
TimecodeScaleDenominator	2	[2A][D7][B2]	*	-	-	1000000000	u					Timecode scale numerator, seeTimecodeScale.
Duration	2	[44][89]	-	-	> 0	-	f	*	*	*	*	Duration of the segment (based on TimecodeScale).
DateUTC	2	[44][61]	-	-	-	-	d	*	*	*	*	Date of the origin of timecode (value 0), i.e. production date.
Title	2	[7B][A9]	-	-	-	-	8	*	*	*		General name of the segment.
MuxingApp	2	[4D][80]	*	-	-	-	8	*	*	*	*	Muxing application or library ("libmatroska-0.4.3").
WritingApp	2	[57][41]	*	-	-	-	8	*	*	*	*	Writing application ("mkvmerge-0.3.3").

3.Track

Track包含了音视频的基本信息，如音视频解码器类型、视频分辨率、音频采样率等这。通过对Track部分的解析。我们就能得到音视频的基本信息。为选择相应解码器以及初始化这些解码器做好准备工作。每个 TrackEntry 代表着1条轨道信息。

Tracks

TrackEntry

Name

TrackNumber

TrackType

TrackEntry

Name

TrackNumber

TrackType

Element Name	L	EBML ID	Ma	Mu	Rng	Default	T	1	2	3	W	Description


Track
Tracks	1	[16][54][AE][6B]	-	*	-	-	m	*	*	*	*	A top-level block of information with many tracks described.
TrackEntry	2	[AE]	*	*	-	-	m	*	*	*	*	Describes a track with all elements.
TrackNumber	3	[D7]	*	-	not 0	-	u	*	*	*	*	The track number as used in the Block Header (using more than 127 tracks is not encouraged, though the design allows an unlimited number).
TrackUID	3	[73][C5]	*	-	not 0	-	u	*	*	*	*	A unique ID to identify the Track. This should be kept the same when making a direct stream copy of the Track to another file.
TrackType	3	[83]	*	-	1-254	-	u	*	*	*	*	A set of track types coded on 8 bits (1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control).
FlagEnabled	3	[B9]	*	-	0-1	1	u		*	*	*	Set if the track is usable. (1 bit)
FlagDefault	3	[88]	*	-	0-1	1	u	*	*	*	*	Set if that track (audio, video or subs) SHOULD be active if no language found matches the user preference. (1 bit)
FlagForced	3	[55][AA]	*	-	0-1	0	u	*	*	*	*	Set if that track MUST be active during playback. There can be many forced track for a kind (audio, video or subs), the player should select the one which language matches the user preference or the default + forced track. Overlay MAY happen between a forced and non-forced track of the same kind. (1 bit)
FlagLacing	3	[9C]	*	-	0-1	1	u	*	*	*	*	Set if the track may contain blocks using lacing. (1 bit)
MinCache	3	[6D][E7]	*	-	-	0	u	*	*	*		The minimum number of frames a player should be able to cache during playback. If set to 0, the reference pseudo-cache system is not used.
MaxCache	3	[6D][F8]	-	-	-	-	u	*	*	*		The maximum cache size required to store referenced frames in and the current frame. 0 means no cache is needed.
DefaultDuration	3	[23][E3][83]	-	-	not 0	-	u	*	*	*	*	Number of nanoseconds (not scaled via TimecodeScale) per frame ('frame' in the Matroska sense -- one element put into a (Simple)Block).
TrackTimecodeScale	3	[23][31][4F]	*	-	> 0	1.0	f	*	*	*		DEPRECATED, DO NOT USE. The scale to apply on this track to work at normal speed in relation with other tracks (mostly used to adjust video speed when the audio length differs).
TrackOffset	3	[53][7F]	-	-	-	0	i					A value to add to the Block's Timecode. This can be used to adjust the playback offset of a track.
MaxBlockAdditionID	3	[55][EE]	*	-	-	0	u	*	*	*		The maximum value of BlockAddID. A value 0 means there is no BlockAdditions for this track.
Name	3	[53][6E]	-	-	-	-	8	*	*	*	*	A human-readable track name.
Language	3	[22][B5][9C]	-	-	-	eng	s	*	*	*	*	Specifies the language of the track in theMatroska languages form.
CodecID	3	[86]	*	-	-	-	s	*	*	*	*	An ID corresponding to the codec, see thecodec page for more info.
CodecPrivate	3	[63][A2]	-	-	-	-	b	*	*	*	*	Private data only known to the codec.
CodecName	3	[25][86][88]	-	-	-	-	8	*	*	*	*	A human-readable string specifying the codec.
AttachmentLink	3	[74][46]	-	-	not 0	-	u	*	*	*		The UID of an attachment that is used by this codec.
CodecSettings	3	[3A][96][97]	-	-	-	-	8					A string describing the encoding setting used.
CodecInfoURL	3	[3B][40][40]	-	*	-	-	s					A URL to find information about the codec used.
CodecDownloadURL	3	[26][B2][40]	-	*	-	-	s					A URL to download about the codec used.
CodecDecodeAll	3	[AA]	*	-	0-1	1	u		*	*		The codec can decode potentially damaged data (1 bit).
TrackOverlay	3	[6F][AB]	-	*	-	-	u	*	*	*		Specify that this track is an overlay track for the Track specified (in the u-integer). That means when this track has a gap (seeSilentTracks) the overlay track should be used instead. The order of multiple TrackOverlay matters, the first one is the one that should be used. If not found it should be the second, etc.
TrackTranslate	3	[66][24]	-	*	-	-	m	*	*	*		The track identification for the given Chapter Codec.
TrackTranslateEditionUID	4	[66][FC]	-	*	-	-	u	*	*	*		Specify an edition UID on which this translation applies. When not specified, it means for all editions found in the segment.
TrackTranslateCodec	4	[66][BF]	*	-	-	-	u	*	*	*		The chapter codec using this ID (0: Matroska Script, 1: DVD-menu).
TrackTranslateTrackID	4	[66][A5]	*	-	-	-	b	*	*	*		The binary value used to represent this track in the chapter codec data. The format depends on the ChapProcessCodecID used.
Video	3	[E0]	-	-	-	-	m	*	*	*	*	Video settings.
FlagInterlaced	4	[9A]	*	-	0-1	0	u		*	*	*	Set if the video is interlaced. (1 bit)
StereoMode	4	[53][B8]	-	-	-	0	u			*	*	Stereo-3D video mode (0: mono, 1: side by side (left eye is first), 2: top-bottom (right eye is first), 3: top-bottom (left eye is first), 4: checkboard (right is first), 5: checkboard (left is first), 6: row interleaved (right is first), 7: row interleaved (left is first), 8: column interleaved (right is first), 9: column interleaved (left is first), 10: anaglyph (cyan/red), 11: side by side (right eye is first), 12: anaglyph (green/magenta), 13 both eyes laced in one Block (left eye is first), 14 both eyes laced in one Block (right eye is first)) . There are some more details on 3D support in the Specification Notes.
OldStereoMode	4	[53][B9]	-	-	-	-	u					DEPRECATED, DO NOT USE. Bogus StereoMode value used in old versions of libmatroska. (0: mono, 1: right eye, 2: left eye, 3: both eyes).
PixelWidth	4	[B0]	*	-	not 0	-	u	*	*	*	*	Width of the encoded video frames in pixels.
PixelHeight	4	[BA]	*	-	not 0	-	u	*	*	*	*	Height of the encoded video frames in pixels.
PixelCropBottom	4	[54][AA]	-	-	-	0	u	*	*	*	*	The number of video pixels to remove at the bottom of the image (for HDTV content).
PixelCropTop	4	[54][BB]	-	-	-	0	u	*	*	*	*	The number of video pixels to remove at the top of the image.
PixelCropLeft	4	[54][CC]	-	-	-	0	u	*	*	*	*	The number of video pixels to remove on the left of the image.
PixelCropRight	4	[54][DD]	-	-	-	0	u	*	*	*	*	The number of video pixels to remove on the right of the image.
DisplayWidth	4	[54][B0]	-	-	not 0	PixelWidth	u	*	*	*	*	Width of the video frames to display. The default value is only valid when DisplayUnit is 0.
DisplayHeight	4	[54][BA]	-	-	not 0	PixelHeight	u	*	*	*	*	Height of the video frames to display. The default value is only valid when DisplayUnit is 0.
DisplayUnit	4	[54][B2]	-	-	-	0	u	*	*	*	*	How DisplayWidth & DisplayHeight should be interpreted (0: pixels, 1: centimeters, 2: inches, 3: Display Aspect Ratio).
AspectRatioType	4	[54][B3]	-	-	-	0	u	*	*	*	*	Specify the possible modifications to the aspect ratio (0: free resizing, 1: keep aspect ratio, 2: fixed).
ColourSpace	4	[2E][B5][24]	-	-	-	-	b	*	*	*		Same value as in AVI (32 bits).
GammaValue	4	[2F][B5][23]	-	-	> 0	-	f					Gamma Value.
FrameRate	4	[23][83][E3]	-	-	> 0	-	f					Number of frames per second. Informationalonly.
Audio	3	[E1]	-	-	-	-	m	*	*	*	*	Audio settings.
SamplingFrequency	4	[B5]	*	-	> 0	8000.0	f	*	*	*	*	Sampling frequency in Hz.
OutputSamplingFrequency	4	[78][B5]	-	-	> 0	Sampling Frequency	f	*	*	*	*	Real output sampling frequency in Hz (used for SBR techniques).
Channels	4	[9F]	*	-	not 0	1	u	*	*	*	*	Numbers of channels in the track.
ChannelPositions	4	[7D][7B]	-	-	-	-	b					Table of horizontal angles for each successive channel, see appendix.
BitDepth	4	[62][64]	-	-	not 0	-	u	*	*	*	*	Bits per sample, mostly used for PCM.
TrackOperation	3	[E2]	-	-	-	-	m			*		Operation that needs to be applied on tracks to create this virtual track. For more detailslook at the Specification Notes on the subject.
TrackCombinePlanes	4	[E3]	-	-	-	-	m			*		Contains the list of all video plane tracks that need to be combined to create this 3D track
TrackPlane	5	[E4]	*	*	-	-	m			*		Contains a video plane track that need to be combined to create this 3D track
TrackPlaneUID	6	[E5]	*	-	not 0	-	u			*		The trackUID number of the track representing the plane.
TrackPlaneType	6	[E6]	*	-	-	-	u			*		The kind of plane this track corresponds to (0: left eye, 1: right eye, 2: background).
TrackJoinBlocks	4	[E9]	-	-	-	-	m			*		Contains the list of all tracks whose Blocks need to be combined to create this virtual track
TrackJoinUID	5	[ED]	*	*	not 0	-	u			*		The trackUID number of a track whose blocks are used to create this virtual track.
TrickTrackUID	3	[C0]	-	-	-	-	u					DivX trick track extenstions
TrickTrackSegmentUID	3	[C1]	-	-	-	-	b					DivX trick track extenstions
TrickTrackFlag	3	[C6]	-	-	-	0	u					DivX trick track extenstions
TrickMasterTrackUID	3	[C7]	-	-	-	-	u					DivX trick track extenstions
TrickMasterTrackSegmentUID	3	[C4]	-	-	-	-	b					DivX trick track extenstions
ContentEncodings	3	[6D][80]	-	-	-	-	m	*	*	*		Settings for several content encoding mechanisms like compression or encryption.
ContentEncoding	4	[62][40]	*	*	-	-	m	*	*	*		Settings for one content encoding like compression or encryption.
ContentEncodingOrder	5	[50][31]	*	-	-	0	u	*	*	*		Tells when this modification was used during encoding/muxing starting with 0 and counting upwards. The decoder/demuxer has to start with the highest order number it finds and work its way down. This value has to be unique over all ContentEncodingOrder elements in the segment.
ContentEncodingScope	5	[50][32]	*	-	not 0	1	u	*	*	*		A bit field that describes which elements have been modified in this way. Values (big endian) can be OR'ed. Possible values: 1 - all frame contents, 2 - the track's private data, 4 - the next ContentEncoding (next ContentEncodingOrder. Either the data inside ContentCompression and/or ContentEncryption)
ContentEncodingType	5	[50][33]	*	-	-	0	u	*	*	*		A value describing what kind of transformation has been done. Possible values: 0 - compression, 1 - encryption
ContentCompression	5	[50][34]	-	-	-	-	m	*	*	*		Settings describing the compression used. Must be present if the value of ContentEncodingType is 0 and absent otherwise. Each block must be decompressable even if no previous block is available in order not to prevent seeking.
ContentCompAlgo	6	[42][54]	*	-	-	0	u	*	*	*		The compression algorithm used. Algorithms that have been specified so far are: 0 - zlib, 1 - bzlib, 2 - lzo1x 3 - Header Stripping
ContentCompSettings	6	[42][55]	-	-	-	-	b	*	*	*		Settings that might be needed by the decompressor. For Header Stripping (ContentCompAlgo=3), the bytes that were removed from the beggining of each frames of the track.
ContentEncryption	5	[50][35]	-	-	-	-	m	*	*	*		Settings describing the encryption used. Must be present if the value of ContentEncodingType is 1 and absent otherwise.
ContentEncAlgo	6	[47][E1]	-	-	-	0	u	*	*	*		The encryption algorithm used. The value '0' means that the contents have not been encrypted but only signed. Predefined values: 1 - DES, 2 - 3DES, 3 - Twofish, 4 - Blowfish, 5 - AES
ContentEncKeyID	6	[47][E2]	-	-	-	-	b	*	*	*		For public key algorithms this is the ID of the public key the the data was encrypted with.
ContentSignature	6	[47][E3]	-	-	-	-	b	*	*	*		A cryptographic signature of the contents.
ContentSigKeyID	6	[47][E4]	-	-	-	-	b	*	*	*		This is the ID of the private key the data was signed with.
ContentSigAlgo	6	[47][E5]	-	-	-	0	u	*	*	*		The algorithm used for the signature. A value of '0' means that the contents have not been signed but only encrypted. Predefined values: 1 - RSA
ContentSigHashAlgo	6	[47][E6]	-	-	-	0	u	*	*	*		The hash algorithm used for the signature. A value of '0' means that the contents have not been signed but only encrypted. Predefined values: 1 - SHA1-160 2 - MD5

上表是track部分的官方文档。咱们还是用上面同样的例子来简单解析一下track 信息。

Segment Information之后紧接着就是track信息了。

整个红色框里的就是track 信息。它包含了2路TrackEntry的信息，分别是蓝绿框中的。

先看看第一个track内容，也就是蓝色框内。只看重要信息

AE 代表着这个整个EBML单元是一个TrackEntry size 0xB5

TrackNumber = 1 ID = D7 size = 1 data = 1;

TrackType = 1 ID = 83 size = 1 data = 1; 查表得知 1 表示的是video 也就是这个tarck信息是视频信息。

CodecID = V_MPEG4/ISO/AVC ID = 86 size = 15 data = "V_MPEG4/ISO/AVC"

PixelWidth = 1280 ID = B0 size = 2 data = 0x500(1280)

PixelHeight = 528 ID = BA size = 2 data = 0x210(528)

用官方的工具解析结果

用同样的方法解析第2路track的信息可以得到

通过对track部分的解析，我们就知道了这个MKV文件包含了2路音视频数据。1路为264的视频、1路为AC3的音频。音视频的相关参数也拿到了。

经知道了MKV文件时长、音视频的类型、分辨率、采样率等基本信息，接下来就是音视频的数据了。

4.Clusters

所有的音视频帧数据都在这部分内装着。

1个Cluster内可能有很多个BlockGroup组成，BlockGroup内又由若干个Block组成。这些Block内就是音视频的帧数据。

1个Cluster并不一定只是音频或者视频。它是由不同的音视频BlockGroup交叉组成。因为多媒体文件中的音视频数据本来就是交叉出现的。

Clusters

Cluster

Timecode

BlockGroup

Block

BlockGroup

Block

ReferenceBlock

BlockGroup

Block

Cluster

Timecode

BlockGroup

Block

BlockGroup

Block

BlockGroup

Block

BlockGroup

Block

BlockDuration

Element Name	L	EBML ID	Ma	Mu	Rng	Default	T	1	2	3	W	Description


Cluster
Cluster	1	[1F][43][B6][75]	-	*	-	-	m	*	*	*	*	The lower level element containing the (monolithic) Block structure.
Timecode	2	[E7]	*	-	-	-	u	*	*	*	*	Absolute timecode of the cluster (based on TimecodeScale).
SilentTracks	2	[58][54]	-	-	-	-	m	*	*	*		The list of tracks that are not used in that part of the stream. It is useful when using overlay tracks on seeking. Then you should decide what track to use.
SilentTrackNumber	3	[58][D7]	-	*	-	-	u	*	*	*		One of the track number that are not used from now on in the stream. It could change later if not specified as silent in a further Cluster.
Position	2	[A7]	-	-	-	-	u	*	*	*		The Position of the Cluster in the segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams.
PrevSize	2	[AB]	-	-	-	-	u	*	*	*	*	Size of the previous Cluster, in octets. Can be useful for backward playing.
SimpleBlock	2	[A3]	-	*	-	-	b		*	*	*	Similar to Block but without all the extra information, mostly used to reduced overhead when no extra feature is needed. (see SimpleBlock Structure)
BlockGroup	2	[A0]	-	*	-	-	m	*	*	*	*	Basic container of information containing a single Block or BlockVirtual, and information specific to that Block/VirtualBlock.
Block	3	[A1]	*	-	-	-	b	*	*	*	*	Block containing the actual data to be rendered and a timecode relative to the Cluster Timecode. (see Block Structure)
BlockVirtual	3	[A2]	-	-	-	-	b					A Block with no data. It must be stored in the stream at the place the real Block should be in display order. (see Block Virtual)
BlockAdditions	3	[75][A1]	-	-	-	-	m	*	*	*		Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data.
BlockMore	4	[A6]	*	*	-	-	m	*	*	*		Contain the BlockAdditional and some parameters.
BlockAddID	5	[EE]	*	-	not 0	1	u	*	*	*		An ID to identify the BlockAdditional level.
BlockAdditional	5	[A5]	*	-	-	-	b	*	*	*		Interpreted by the codec as it wishes (using the BlockAddID).
BlockDuration	3	[9B]	-	-	-	TrackDuration	u	*	*	*	*	The duration of the Block (based on TimecodeScale). This element is mandatory when DefaultDuration is set for the track (but can be omitted as other default values). When not written and with no DefaultDuration, the value is assumed to be the difference between the timecode of this Block and the timecode of the next Block in "display" order (not coding order). This element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks. When set to 0 that means the frame is not a keyframe.
ReferencePriority	3	[FA]	*	-	-	0	u	*	*	*		This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced.
ReferenceBlock	3	[FB]	-	*	-	-	i	*	*	*	*	Timecode of another frame used as a reference (ie: B or P frame). The timecode is relative to the block it's attached to.
ReferenceVirtual	3	[FD]	-	-	-	-	i					Relative position of the data that should be in position of the virtual block.
CodecState	3	[A4]	-	-	-	-	b		*	*		The new codec state to use. Data interpretation is private to the codec. This information should always be referenced by a seek entry.
Slices	3	[8E]	-	-	-	-	m	*	*	*	*	Contains slices description.
TimeSlice	4	[E8]	-	*	-	-	m	*	*	*	*	Contains extra time information about the data contained in the Block. While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
LaceNumber	5	[CC]	-	-	-	0	u	*	*	*	*	The reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback.
FrameNumber	5	[CD]	-	-	-	0	u					The number of the frame to generate from this lace with this delay (allow you to generate many frames from the same Block/Frame).
BlockAdditionID	5	[CB]	-	-	-	0	u					The ID of the BlockAdditional element (0 is the main Block).
Delay	5	[CE]	-	-	-	0	u					The (scaled) delay to apply to the element.
SliceDuration	5	[CF]	-	-	-	0	u					The (scaled) duration to apply to the element.
ReferenceFrame	3	[C8]	-	-	-	-	m					DivX trick track extenstions
ReferenceOffset	4	[C9]	*	-	-	-	u					DivX trick track extenstions
ReferenceTimeCode	4	[CA]	*	-	-	-	u					DivX trick track extenstions
EncryptedBlock	2	[AF]	-	*	-	-	b					Similar to SimpleBlock but the data inside the Block are Transformed (encrypt and/or signed). (see EncryptedBlock Structure)

还有用之前的例子

Cluster

ID =

[1F][43][B6][75]

size = 0x12468f (1197711)

剩下的1197711的数据就是这个Cluster 的data

第一个EBML元素是Timecode ID = E7 size = 1 值为0 (红框内)

第二个元素ID = A0 查表可知这个EBML元素就是BlockGroup size = 96042

紧接着就是ID = A1 第三级EBML元素 Block size = 96038

Block 结构如下图

Block Header

Offset

Player

Description

0x00+

must

Track Number (Track Entry). It is coded in EBML like form (1 octet if the value is < 0x80, 2 if < 0x4000, etc) (most significant bits set to increase the range).

0x01+

must

Timecode (relative to Cluster timecode, signed int16)

0x03+

Flags
Bit	Player	Description
0-3	-	Reserved, set to 0
4	-	Invisible, the codec should decode this frame but not display it
5-6	must	Lacing 00 : no lacing 01 : Xiph lacing 11 : EBML lacing 10 : fixed-size lacing
7	-	not used

Lace (when lacing bit is set)

0x00

must

Number of frames in the lace-1 (uint8)

0x01 / 0xXX

must*

Lace-coded size of each frame of the lace, except for the last one (multiple uint8). *This is not used with Fixed-size lacing as it is calculated automatically from (total size of lace) / (number of frames in lace).

(possibly) Laced Data

0x00

must

Consecutive laced frames

第1字节表示Track Number

第2-3字节表示Timecode

第4字节表示 flags

看上面的例子，

Block data 第1个字节 0x81 按照EBML解释方式 Track Number = 1,结合上文得知这个Block 数据是track 1的数据。track 1对应的是video数据，解码器类型是H.264.也就是这个block 的数据是264帧数据

Timecode 为 0000

flags = 0

Lace是根据 flags 的值来确定的。上面这个flags 5-6位都是0 所有是no lacing。剩下的96038 - 4 都是视频的帧数据。

将这个96034长度block 的数据转成NALU格式，然后加上从track部分中的CodecPrivate数据中解析出来的sps 和 pps 信息保存到本地，应该就是1帧的264数据

用elecard 打开果然是1帧I帧数据

按照这个套路，看看下一个

BlockGroup ID = A0 size = 0x2808

第一个block ID = A1 size = 0x2805(10245)

第1字节表示Track Number =2 表示是track 2的数据，track是ac3 的音频。

第2-3字节表示Timecode = 0x0005；

第4字节表示 flags = 0x04

这个时候就要解析 Lace flags 第5-6位为10 所有属于fixed-size lacing

Fixed-size lacing 是如下的结构

Fixed-size lacing

In this case only the number of frames in the lace is saved, the size of each frame is deduced from the total size of the Block. For example, for 3 frames of 800 octets each :

Block head (with lacing bits set to 10)
Lacing head: Number of frames in the lace -1, i.e. 2
Data in frame 1
Data in frame 2
Data in frame 3

07+1 就是包含的帧数，因为是ac3的音频可以看到 07后面紧接着就是ac3的同步头0x0b77(绿框)

用工具看看相对应的解析结果

按照这种逻辑和方法，我们就可以把mkv文件中的音视频数据流demux出来了。

5.Cueing Data

Cueing Data 这部分内容其实是关键帧的index，如果没有关键帧的index的话，在做seek、快进快退的时候是十分困难的。你要逐个包去找。之前说过flv文件中官方没有做I帧index的规定。但是在民间已经做了补充。mkv官方有对index的规范。那就是Cueing Data

下面是结构图。

Cueing Data

Cues

CuePoint

CueTime

CuePosition

CuePoint

CueTime

CuePosition

Cueing Data
Cues	1	[1C][53][BB][6B]	-	-	-	-	m	*	*	*	*	A top-level element to speed seeking access. All entries are local to the segment. Should be mandatory for non "live" streams.
CuePoint	2	[BB]	*	*	-	-	m	*	*	*	*	Contains all information relative to a seek point in the segment.
CueTime	3	[B3]	*	-	-	-	u	*	*	*	*	Absolute timecode according to the segment time base.
CueTrackPositions	3	[B7]	*	*	-	-	m	*	*	*	*	Contain positions for different tracks corresponding to the timecode.
CueTrack	4	[F7]	*	-	not 0	-	u	*	*	*	*	The track for which a position is given.
CueClusterPosition	4	[F1]	*	-	-	-	u	*	*	*	*	The position of the Cluster containing the required Block.
CueRelativePosition	4	[F0]	-	-	-	-	u					The relative position of the referenced block inside the cluster with 0 being the first possible position for an element inside that cluster.
CueDuration	4	[B2]	-	-	-	-	u					The duration of the block according to the segment time base. If missing the track's DefaultDuration does not apply and no duration information is available in terms of the cues.
CueBlockNumber	4	[53][78]	-	-	not 0	1	u	*	*	*	*	Number of the Block in the specified Cluster.
CueCodecState	4	[EA]	-	-	-	0	u		*	*		The position of the Codec State corresponding to this Cue element. 0 means that the data is taken from the initial Track Entry.
CueReference	4	[DB]	-	*	-	-	m		*	*		The Clusters containing the required referenced Blocks.
CueRefTime	5	[96]	*	-	-	-	u		*	*		Timecode of the referenced Block.

继续看上面的例子，我们找到了Cues 所在的位置。

ID = [1C][53][BB][6B] 表示Cues size = 0x7f

紧接着每个ID = 0xBB 就是一个CuePoint，图中的绿色框中的就是一个。 size = 0xC

CueTime ID = 0xB3 size = 1 data = 0;

CueTrackPositions ID=0xB7 size=7 data=0xf78101f18215ef

CueTrack ID=F7 size = 1 data = 1 表示这个位置的track num 值为1 针对这个流应该是video

CueClusterPosition ID = F1 size = 2 data = 15ef 位置是在0x15ef(5615) 相对于Segment

找到这个位置发现是第一个Clusters 上面章节分析了，这个族的video内容正好是关键帧。

按照这种方式发现这个文件中共有8个cuepoint 信息

把这个文件中的264视频demux出来，用工具查看发现关键帧正好也是8个。

6.小结

已经把MKV主要部分的内容作了一次详细的叙述，现在对mkv文件做个小结。

1.MKV的基本组成单元都是EBML格式。每个元素都有级别。一级一级的包括组成了mkv不同的部分。

Level 0

Grouping

Level 1

Level 2

Level 3

2.MKV是由EBML header 和Segment 2大部分组成。Segment中又分Meta Seek InformationSegment InformationTrackChaptersClustersCueing DataAttachmentTagging

3.EBML header 部分包含着MKV可辨识性的信息。

4.Meta Seek Information包含其实部分位置信息。

5.Segment Information 包含识别文件的信息，包括 Title 、 SegmentUID,有个比较关心的文件时常信息Duration也在这一部分

6.Track包含了音视频的基本信息，如音视频解码器类型、视频分辨率、音频采样率等。

7.真实的音视频数据信息交叉装在Clusters中

8.Cueing Data 关键帧index，对seek至关重要。