总的来说H264的码流的打包方式有两种,一种为annex-b byte stream format的格式,这个是绝大部分编码器的默认输出格式,就是每个帧的开头的3~4个字节是H264的start_code,0x00000001或者0x000001。另一种是 avcc byte stream format的格式,不过没 annex b 常用。



   if( next_bits( 24 ) != 0x000001 )
        zero_byte                                      f(8)
    start_code_prefix_one_3bytes   f(24)
根据B.1节,可以看到所谓的4字节起始码是(zero_byte + 3字节起始码)。那么看zero_byte的说明,就可以明白zero_byte什么时候出现,也就能明白什么时候出现4字节起始码:
1. SPS、PPS nalu是4字节起始码;
2. Access Unit的首个nalu是4字节起始码(参见7.。
      SPS            (一定是4字节头)
      PPS            (一定是4字节头)
      SEI            (4字节头)
      I0(slice0)     (4字节头)
      I0(slice1)    (3字节头)
      P1(slice0)     (4字节头)
      P1(slice1)    (3字节头)
      P2(slice0)     (4字节头)
      P2(slice1)    (3字节头)
I0(slice0)是序列第一帧(I帧)的第一个slice,是当前Access Unit的首个nalu,所以是4字节头。而I0(slice1)表示第一帧的第二个slice,所以是3字节头。P1(slice0) 、P1(slice1)同理。

1 附录 B字节流在一个byte_stream_nal_unit的前后可能出现若干个0x00,仅用作填充之用。这个不常见。
2 4字节头只出现在SPS、PPS和7.规定的Access Unit的首个nalu。其余情况都是3字节头


SPS starts with 67. PPS starts with 68. And the length of SPS is
variable and dependent on the toolsets enabled. The length of SPS can be
known by the number of bytes between 67 and 68.

因此只要发现0x00 0x00 0x00 0x01 0x67,则可以肯定后面跟的是SPS;如果发现 0x00 0x00 0x00 0x01 0x68, 则可以肯定后面跟的是PPS,PPS的长度为其起始码到下一个起始码(大多数情况下为0x00000001,少数情况下为0x000001);


以下是外国人的最简单H264编码器示例对 SPS的解释:    http://www.cardinalpeak.com/blog/the-h-264-sequence-parameter-set/

In my trivial encoder, the h.264 SPS and PPS were hardcoded in hex as:

/* h.264 bitstreams */
const uint8_t sps[] =
{0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00, 0x0a, 0xf8, 0x41, 0xa2};
const uint8_t pps[] =
{0x00, 0x00, 0x00, 0x01, 0x68, 0xce, 0x38, 0x80};

Let’s decode this into something readable from the spec. The first thing I did was to look at section 7 of the h.264 specification. I saw that at a minimum I had to choose how to fill in the SPS parameters in the table below. In the table, as in the standard, the type u(n) indicates an unsigned integer of n bits, and ue(v) indicates an unsigned exponential-golomb coded value of a variable number of bits. The spec doesn’t seem to define the maximum number of bits anywhere, but the reference encoder software uses 32. (People wishing to explore the security of decoder software may find it interesting to violate this assumption!)

Parameter Name Type Value Comments
forbidden_zero_bit u(1) 0 Despite being forbidden, it must be set to 0!
nal_ref_idc u(2) 3 3 means it is “important” (this is an SPS)
nal_unit_type u(5) 7 Indicates this is a sequence parameter set
profile_idc u(8) 66 Baseline profile
constraint_set0_flag u(1) 0 We’re not going to honor constraints
constraint_set1_flag u(1) 0 We’re not going to honor constraints
constraint_set2_flag u(1) 0 We’re not going to honor constraints
constraint_set3_flag u(1) 0 We’re not going to honor constraints
reserved_zero_4bits u(4) 0 Better set them to zero
level_idc u(8) 10 Level 1, sec A.3.1
seq_parameter_set_id ue(v) 0 We’ll just use id 0.
log2_max_frame_num_minus4 ue(v) 0 Let’s have as few frame numbers as possible
pic_order_cnt_type ue(v) 0 Keep things simple
log2_max_pic_order_cnt_lsb_minus4 ue(v) 0 Fewer is better.
num_ref_frames ue(v) 0 We will only send I slices
gaps_in_frame_num_value_allowed_flag u(1) 0 We will have no gaps
pic_width_in_mbs_minus_1 ue(v) 7 SQCIF is 8 macroblocks wide
pic_height_in_map_units_minus_1 ue(v) 5 SQCIF is 6 macroblocks high
frame_mbs_only_flag u(1) 1 We will not to field/frame encoding
direct_8x8_inference_flag u(1) 0 Used for B slices. We will not send B slices
frame_cropping_flag u(1) 0 We will not do frame cropping
vui_prameters_present_flag u(1) 0 We will not send VUI data
rbsp_stop_one_bit u(1) 1 Stop bit. I missed this at first and it caused me much trouble.

(从H264标准文档的7.4.1节NAL unit semantics对nal_unit_type的说明 及 B.1.2节的 Byte stream NAL unit semantics 可以得到确认,它里面有这么一句:the nal_unit_type within the nal_unit() is equal to 7 (sequence parameter set) or 8 (picture parameter set),这句表明第一个字节(即start code后的一个字节)若是0x67则是SPS,若是0x68则是PPS,参见上表的第一个字节为0x67)

Some key things here are the profile (profile_idc) and level (level_idc) that I chose, and the picture width and height. If you encode the above table in hex, you will get the values in the SPS array declared above.

A question I got a couple of times in email was about the width and height parameters—specifically, what to do if the picture width or height is not an integer multiple of macroblock size. Recall that, for the 4:2:0 sampling scheme in my encoder, a macroblock consists of 16×16 luma samples. In this case, you would set the frame_cropping_flag to 1, and reduce the number of pixels in the horizontal and vertical direction with the frame_crop_left_offset, frame_crop_right_offset, frame_crop_top_offset, and frame_crop_bottom_offset parameters, which are conditionally present in the bitstream only if the frame_cropping_flag is set to one.

One interesting problem that we see fairly often with h.264 is when the container format (MP4, MOV, etc.) contains different values for some of these parameters than the SPS and PPS. In this case, we find different video players handle the streams differently.

A handy tool for decoding h.264 bitstreams, including the SPS, is the h264bitstream tool. It comes with a command line program that decodes a bitstream to the parameter names defined in the h.264 specification. Let’s look at its output for a sample mp4 file I downloaded from youtube. First, I extract the h.264 NAL units from the file using ffmpeg:

ffmpeg.exe -i Old Faithful.mp4 -vcodec copy -vbsf h264_mp4toannexb -an of.h264

The NAL units now reside in the file of.h264. I then run the h264_analyze command from the h264bitstream package to produce the following output:

h264_analyze of.h264
!! Found NAL at offset 4 (0x0004), size 25 (0x0019)
==================== NAL ====================
forbidden_zero_bit : 0
nal_ref_idc : 3
nal_unit_type : 7 ( Sequence parameter set )
======= SPS =======
profile_idc : 100
constraint_set0_flag : 0
constraint_set1_flag : 0
constraint_set2_flag : 0
constraint_set3_flag : 0
reserved_zero_4bits : 0
level_idc : 31
seq_parameter_set_id : 0
chroma_format_idc : 1
residual_colour_transform_flag : 0
bit_depth_luma_minus8 : 0
bit_depth_chroma_minus8 : 0
qpprime_y_zero_transform_bypass_flag : 0
seq_scaling_matrix_present_flag : 0
log2_max_frame_num_minus4 : 3
pic_order_cnt_type : 0
log2_max_pic_order_cnt_lsb_minus4 : 3
delta_pic_order_always_zero_flag : 0
offset_for_non_ref_pic : 0
offset_for_top_to_bottom_field : 0
num_ref_frames_in_pic_order_cnt_cycle : 0
num_ref_frames : 1
gaps_in_frame_num_value_allowed_flag : 0
pic_width_in_mbs_minus1 : 79
pic_height_in_map_units_minus1 : 44
frame_mbs_only_flag : 1
mb_adaptive_frame_field_flag : 0
direct_8x8_inference_flag : 1
frame_cropping_flag : 0
frame_crop_left_offset : 0
frame_crop_right_offset : 0
frame_crop_top_offset : 0
frame_crop_bottom_offset : 0
vui_parameters_present_flag : 1
=== VUI ===
aspect_ratio_info_present_flag : 1
aspect_ratio_idc : 1
sar_width : 0
sar_height : 0
overscan_info_present_flag : 0
overscan_appropriate_flag : 0
video_signal_type_present_flag : 0
video_signal_type_present_flag : 0
video_format : 0
video_full_range_flag : 0
colour_description_present_flag : 0
colour_primaries : 0
transfer_characteristics : 0
matrix_coefficients : 0
chroma_loc_info_present_flag : 0
chroma_sample_loc_type_top_field : 0
chroma_sample_loc_type_bottom_field : 0
timing_info_present_flag : 1
num_units_in_tick : 100
time_scale : 5994
fixed_frame_rate_flag : 1
nal_hrd_parameters_present_flag : 0
vcl_hrd_parameters_present_flag : 0
low_delay_hrd_flag : 0
pic_struct_present_flag : 0
bitstream_restriction_flag : 1
motion_vectors_over_pic_boundaries_flag : 1
max_bytes_per_pic_denom : 0
max_bits_per_mb_denom : 0
log2_max_mv_length_horizontal : 11
log2_max_mv_length_vertical : 11
num_reorder_frames : 0
max_dec_frame_buffering : 1
=== HRD ===
cpb_cnt_minus1 : 0
bit_rate_scale : 0
cpb_size_scale : 0
initial_cpb_removal_delay_length_minus1 : 0
cpb_removal_delay_length_minus1 : 0
dpb_output_delay_length_minus1 : 0
time_offset_length : 0

The only additional thing I’d like to point out here is that this particular SPS also contains information about the frame rate of the video (see timing_info_present_flag). These parameters must be closely checked when you generate bitstreams to ensure they agree with the container format that the h.264 will eventually be muxed into. Even a small error, such as 29.97 fps in one place and 30 fps in another, can result in severe audio/video synchronization problems.





Three ways comes to mind (if you are looking for something free, else google "h264 analysis"):
    a)  Download h.264 parser from:    http://www.w6rz.net/h264_parse.zip (from this thread @ doom9 http://forum.doom9.org/archive/index.php/t-133070.html)
    b)  Download the H.264 reference SW from:   http://iphome.hhi.de/suehring/tml/
    c)  h264bitstream:       http://h264bitstream.sourceforge.net/     or      http://sourceforge.net/projects/h264bitstream/
This should get you started. BTW bitstream is described in Annex. B. in the specs. Download it from ITU http://www.itu.int/rec/T-REC-H.264-201003-I/en


