[转] The QCOW2 Image Format
The QCOW2 Image Format
https://people.gnome.org/~markmc/qcow-image-format.html
The QCOW image format is one of the disk image formats supported by the QEMU processor emulator. It is a representation of a fixed size block device in a file. Benefits it offers over using raw dump representation include:
- Smaller file size, even on filesystems which don't support holes (i.e. sparse files)
- Copy-on-write support, where the image only represents changes made to an underlying disk image
- Snapshot support, where the image can contain multiple snapshots of the images history
- Optional zlib based compression
- Optional AES encryption
The qemu-img command is the most common way of manipulating these images e.g.
$> qemu-img create -f qcow2 test.qcow2 4G
Formating 'test.qcow2', fmt=qcow2, size=4194304 kB
$> qemu-img convert test.qcow2 -O raw test.img
The Header
Each QCOW2 file begins with a header, in big endian format, as follows:
typedef struct QCowHeader {
uint32_t magic;
uint32_t version; uint64_t backing_file_offset;
uint32_t backing_file_size; uint32_t cluster_bits;
uint64_t size; /* in bytes */
uint32_t crypt_method; uint32_t l1_size;
uint64_t l1_table_offset; uint64_t refcount_table_offset;
uint32_t refcount_table_clusters; uint32_t nb_snapshots;
uint64_t snapshots_offset;
} QCowHeader;
- The first 4 bytes contain the characters 'Q', 'F', 'I' followed by 0xfb.
- The next 4 bytes contain the format version used by the file. Currently, there has been two versions of the format, version 1 and version2. We are discussing the latter here, and the former is discussed at the end.
- The backing_file_offset field gives the offset from the beginning of the file to a string containing the path to a file; backing_file_size gives the length of this string, which isn't a nul-terminated. If this image is a copy-on-write image, then this will be the path to the original file. More on that below.
- The cluster_bits fields them, describe how to map an image offset address to a location within the file; it determines the number of lower bits of the offset address are used as an index within a cluster. Since L2 tables occupy a single cluster and contain 8 byte entires, the next most significant cluster_bits, less three bits, are used as an index into the L2 table. the L2 table. More on the format's 2-level lookup system below.
- The next 8 bytes contain the size, in bytes, of the block device represented by the image.
- The crypt_method field is 0 if no encryption has been used, and 1 if AES encryption has been used.
- The l1_size field gives the number of 8 byte entries available in the L1 table and l1_table_offset gives the offset within the file of the start of the table.
- Similarily, refcount_table_offset gives the offset to the start of the refcount table, but refcount_table_clusters describes the size of the refcount table in units of clusters.
- nb_snapshots gives the number of snapshots contained in the image and snapshots_offset gives the offset of the QCowSnapshotHeader headers, one for each snapshot.
Typically the image file will be laid out as follows:
- The header, as described above.
- Starting at the next cluster boundary, the L1 table.
- The refcount table, again boundary aligned.
- One or more refcount blocks.
- Snapshot headers, the first boundary aligned and the following headers aligned on 8 byte boundaries.
- L2 tables, each one occupying a single cluster.
- Data clusters.
2-Level Lookups
With QCOW, the contents of the device are stored in clusters. Each cluster contains a number of 512 byte sectors.
In order to find the cluster for a given address within the device, you must traverse two levels of tables. The L1 table is an array of file offsets to L2 tables, and each L2 table is an array of file offsets to clusters.
So, an address is split into three separate offsets according to the cluster_bits field. For example, if cluster_bits is 12, then the address is split up as follows:
- the lower 12 is an offset within a 4Kb cluster
- the next 9 bits is an offset within a 512 entry array of 8 byte file offsets, the L2 table. The number of bits needed here is given by l2_bits = cluster_bits - 3 since the L2 table is a single cluster containing 8 byte entries
- the remaining 43 bits is an offset within another array of 8 byte file offsets, the L1 table
Note, the minimum size of the L1 table is a function of the size of the represented disk image:
l1_size = round_up(disk_size / (cluster_size * l2_size), cluster_size)
In other words, in order to map a given disk address to an offset within the image:
- Obtain the L1 table address using the l1_table_offset header field
- Use the top (64 - l2_bits - cluster_bits) bits of the address to index the L1 table as an array of 64 bit entries
- Obtain the L2 table address using the offset in the L1 table
- Use the next l2_bits of the address to index the L2 table as an array of 64 bit entries
- Obtain the cluster address using the offset in the L2 table.
- Use the remaining cluster_bits of the address as an offset within the cluster itself
If the offset found in either the L1 or L2 table is zero, that area of the disk is not allocated within the image.
Note also, that the top two bits of each of the offsets found in the L1 and L2 tables are reserved for "copied" and "compressed" flags. More on that below.
Reference Counting
Each cluster is reference counted, allowing clusters to be freed if, and only if, they are no longer used by any snapshots.
The 2 byte reference count for each cluster is kept in cluster sized blocks. A table, given by refcount_table_offset and occupying refcount_table_clusters clusters, gives the offset in the image of each of these refcount blocks.
In order to obtain the reference count of a given cluster, you split the cluster offset into a refcount table offset and refcount block offset. Since a refcount block is a single cluster of 2 byte entries, the lower cluster_size - 1 bits is used as the block offset and the rest of the bits are used as the table offset.
One optimization is that if any cluster pointed to by an L1 or L2 table entry has a refcount exactly equal to one, the most significant bit of the L1/L2 entry is set as a "copied" flag. This indicates that no snapshots are using this cluster and it can be immediately written to without having to make a copy for any snapshots referencing it.
Copy-on-Write Images
A QCOW image can be used to store the changes to another disk image, without actually affecting the contents of the original image. The image, known as a copy-on-write image, looks like a standalone image to the user but most of its data is obtained from the original image. Only the clusters which differ from the original image are stored in the copy-on-write image file itself.
The representation is very simple. The copy-on-write image contains the path to the original disk image, and the image header gives the location of the path string within the file.
When you want to read an cluster from the copy-on-write image, you first check to see if that area is allocated within the copy-on-write image. If not, you read the area from the original disk image.
Snapshots
Snapshots are a similar notion to the copy-on-write feature, except it is the original image that is writable, not the snapshots.
To explain further - a copy-on-write image could confusingly be called a "snapshot", since it does indeed represent a snapshot of the original images state. You can make multiple of these "snapshots" of the original image by creating multiple copy-on-write images, each referring to the same original image. What's noteworthy here, though, is that the original image must be considered read-only and it is the copy-on-write snapshots which are writable.
Snapshots - "real snapshots" - are represented in the original image itself. Each snapshot is a read-only record of the image a past instant. The original image remains writable and as modifications are made to it, a copy of the original data is made for any snapshots referring to it.
Each snapshot is described by a header:
typedef struct QCowSnapshotHeader {
/* header is 8 byte aligned */
uint64_t l1_table_offset; uint32_t l1_size;
uint16_t id_str_size;
uint16_t name_size; uint32_t date_sec;
uint32_t date_nsec; uint64_t vm_clock_nsec; uint32_t vm_state_size;
uint32_t extra_data_size; /* for extension */
/* extra data follows */
/* id_str follows */
/* name follows */
} QCowSnapshotHeader;
Details are as follows
- A snapshot has both a name and ID, represented by strings (not zero-terminated) which follow the header.
- A snapshot also has a copy, at least, of the original L1 table given by l1_table_offset and l1_size.
- date_sec and date_nsec give the host machine gettimeofday() when the snapshot was created.
- vm_clock_nsec gives the current state of the VM clock.
- vm_state_size gives the size of the virtual machine state which was saved as part of this snapshot. The state is saved to the location of the original L1 table, directly after the image header.
- extra_data_size species the number of bytes of data which follow the header, before the id and name strings. This is provided for future expansion.
A snapshot is created by adding one of these headers, making a copy of the L1 table and incrementing the reference counts of all L2 tables and data clusters referenced by the L1 table. Later, if any L2 table or data clusters of the underlying image are to be modified - i.e. if the reference count of the cluster is greater than 1 and/or the "copied" flag is set for that cluster - they will first be copied and then written to. That way, all snapshots remains unmodified.
Compression
The QCOW format supports compression by allowing each cluster to be independently compressed with zlib.
This is represented in the cluster offset obtained from the L2 table as follows:
- If the second most significant bit of the cluster offset is 1, this is a compressed cluster
- The next cluster_bits - 8of the cluster offset is the size of the compressed cluster, in 512 byte sectors
- The remaining bits of the cluster offset is the actual address of the compressed cluster within the image
Encryption
The QCOW format also supports the encryption of clusters.
If the crypt_method header field is 1, then a 16 character password is used as the 128 bit AES key.
Each sector within each cluster is independently encrypted using AES Cipher Block Chaining mode, using the sector's offset (relative to the start of the device) in little-endian format as the first 64 bits of the 128 bit initialisation vector.
The QCOW Format
Version 2 of the QCOW format differs from the original version in the following ways:
- It supports the concepts of snapshots; version 1 only had the concept of copy-on-write image
- Clusters are reference counted in version 2; reference counting was added to support snapshots
- L2 tables always occupy a single cluster in version 2; previously their size was given by a l2_bits header field
- The size of compressed clusters is now given in sectors instead of bytes
A previous version of this document which described version 1 only is available here.
Mark McLoughlin. Sep 11, 2008.
[转] The QCOW2 Image Format的更多相关文章
- [转] Snapshotting with libvirt for qcow2 images
http://kashyapc.com/2011/10/04/snapshotting-with-libvirt-for-qcow2-images/ Libvirt 0.9.6 was recentl ...
- qcow2磁盘加密及libvirt访问
1.创建qcow2加密磁盘[root@Coc-5 test_encrypt]# qemu-img convert -f qcow2 -O qcow2 -o encryption template_ ...
- 使用派生镜像(qcow2)
当创建的虚拟机越来越多,并且你发现好多虚拟机都是同一个操作系统,它们的区别就是安装的软件不大一样,那么你肯定会希望把他们公共的部分提取出来,只保存那些与公共部分不同的东西,这样镜像大小下去了,空间变多 ...
- QCOW2/RAW/qemu-img 概念浅析
目录 目录 扩展阅读 RAW QCOW2 QEMU-COW 2 QCOW2 Header QCOW2 的 COW 特性 QCOW2 的快照 qemu-img 的基本使用 RAW 与 QCOW2 的区别 ...
- KVM安装部署
KVM安装部署 公司开始部署KVM,KVM的全称是kernel base virtual machine,对KVM虚拟化技术研究了一段时间, KVM是基于硬件的完全虚拟化,跟vmware.xen.hy ...
- kvm
硬件,os,内核模块,用户空间工具,命令行具体参数,日志 [root@localhost ~]# yum install pciutils [root@localhost ~]# lscpu;lspc ...
- kvm虚拟化平台搭建入门
KVM虚拟化有两种网络模式:1)Bridge网桥模式2)NAT网络地址转换模式Bridge方式适用于服务器主机的虚拟化.NAT方式适用于桌面主机的虚拟化. 环境: 本次实验要开启VMWare中对应Ce ...
- kvm虚拟机--存储池配置梳理(转)
1.创建基于文件夹的存储池(目录) 2.定义存储池与其目录 1 # virsh pool-define-as vmdisk --type dir --target /data/vmfs 3.创建已定义 ...
- kvm快照
Kvm快照: 1.基于lvm的快照 2.kvm自带的快照功能(需要qcow2 磁盘文件才支持快照) 关闭kvm虚拟机: 查看磁盘文件信息: [root@super67 ~]# qemu-img inf ...
随机推荐
- Vue-input框checkbox强制刷新
在引用input框的checkbox属性时,选中后会出现数据已经刷新,checkbox选中状态不会改变.这时在事件触发后可以调用this.$forceUpdate(),强制刷新页面解决这个问题. in ...
- eclipse在mac上的快捷键
Command + Shift + R: 搜索本地项目文件 Command + Shift + T: 搜索jar中的文件 Command+t:快速显示当前类的结构 command+O: 在某个类文件, ...
- java_28 序列化与反序列化
1.序列化和反序列化 序列化:把对象转换为字节序列的过程称为对象的序列化.(常见的就是存文件) 反序列化:把字节序列恢复为对象的过程称为对象阿德反序列化. 2.序列化和反序列化的使用: java.io ...
- phpstudy 配置端口 和 虚拟域名访问
打开phpstudy窗口 选择->其他选项菜单->打开配置文件->httpd.conf 添加一个 Listen 8081(端口号) 查看到 Include conf/vhost ...
- VC++安装及使用
1.在浏览器上下载后不能安装 2.黄振古QQ发原文件,依然不能安装 3.考虑后,想通过360压缩安装 4.浏览器上下载的360压缩大多有病毒,无奈下,删掉鲁大师,下载360安全卫士,通过360下载36 ...
- Asp.net core 向Consul 注册服务
Consul服务发现的使用方法:1. 在每台电脑上都以Client Mode的方式运行一个Consul代理, 这个代理只负责与Consul Cluster高效地交换最新注册信息(不参与Leader的选 ...
- mysql 主从库同步
#主库修改my.ini [mysqld] server log-bin=mysql-bin binlog-do-db=demo #从库修改my.ini [mysqld] server replicat ...
- Windows 10 专业版 长期服务版 激活
这个用小白系统之后一段时间显示要求激活,或者更改产品秘钥.网上找了许多秘钥也是没啥用,又不想用激活工具的话,可以试试用win+R 输入cmd : 依次输入:slmgr /skms kms.digibo ...
- 实验十五 GUI编程练习与应用程序部署
实验十五 GUI编程练习与应用程序部署 实验时间 2018-12-6 一:理论部分 1.Java 程序的打包:编译完成后,程序员将.class 文件压缩打包为 .jar 文件后,GUI 界面序就可以 ...
- linux shell 多个命令一起执行的几种方法
在命令行可以一次执行多个命令,有以下几种: 1.每个命令之间用;隔开 说明:各命令的执行结果,不会影响其它命令的执行.换句话说,各个命令都会执行, 但不保证每个命令都执行成功. cd /home/Py ...