UBIFS 文件系统分析1 - 磁盘结构【转】

转自：http://blog.csdn.net/kickxxx/article/details/7109662

ubifs磁盘结构
UBIFS文件系统把UBI volume划分为六个部分，分别为
1. superblock area，使用LEB0
2. master area，使用LEB1和LEB2
3. log area，从LEB3开始，log area区的大小
4. LPT area，跟随在log area之后，LPT的大小在创建文件系统时确定
5. orpan area，在log area和main area之间，使用固定数目的LEBs，一般来说，占用一个LEB足以
6. main area，最后一个area，存放文件系统数据和index

Superblock area
占用一个LEB存储superblock node，一般来说，superblock node保存文件系统很少变化的参数。superblock node仅仅占用LEB0的前4096个字节。
superblock node磁盘数据结构如下：
/**
* struct ubifs_sb_node - superblock node.
* @ch: common header
* @padding: reserved for future, zeroes
* @key_hash: type of hash function used in keys
* @key_fmt: format of the key
* @flags: file-system flags (%UBIFS_FLG_BIGLPT, etc)
* @min_io_size: minimal input/output unit size
* @leb_size: logical eraseblock size in bytes
* @leb_cnt: count of LEBs used by file-system
* @max_leb_cnt: maximum count of LEBs used by file-system
* @max_bud_bytes: maximum amount of data stored in buds
* @log_lebs: log size in logical eraseblocks
* @lpt_lebs: number of LEBs used for lprops table
* @orph_lebs: number of LEBs used for recording orphans
* @jhead_cnt: count of journal heads
* @fanout: tree fanout (max. number of links per indexing node)
* @lsave_cnt: number of LEB numbers in LPT's save table
* @fmt_version: UBIFS on-flash format version
* @default_compr: default compression algorithm (%UBIFS_COMPR_LZO, etc)
* @padding1: reserved for future, zeroes
* @rp_uid: reserve pool UID
* @rp_gid: reserve pool GID
* @rp_size: size of the reserved pool in bytes
* @padding2: reserved for future, zeroes
* @time_gran: time granularity in nanoseconds
* @uuid: UUID generated when the file system image was created
* @ro_compat_version: UBIFS R/O compatibility version
*/
struct ubifs_sb_node {
    struct ubifs_ch ch;
    __u8 padding[2];
    __u8 key_hash;
    __u8 key_fmt;
    __le32 flags;
    __le32 min_io_size;
    __le32 leb_size;
    __le32 leb_cnt;
    __le32 max_leb_cnt;
    __le64 max_bud_bytes;
    __le32 log_lebs;
    __le32 lpt_lebs;
    __le32 orph_lebs;
    __le32 jhead_cnt;
    __le32 fanout;
    __le32 lsave_cnt;
    __le32 fmt_version;
    __le16 default_compr;
    __u8 padding1[2];
    __le32 rp_uid;
    __le32 rp_gid;
    __le64 rp_size;
    __le32 time_gran;
    __u8 uuid[16];
    __le32 ro_compat_version;
    __u8 padding2[3968];
} __attribute__ ((packed));

@key_hash：文件系统生成key的hash函数类型
@min_io_size：文件系统最小读写单元
@leb_size：Logical erased block尺寸
@leb_cnt：文件系统实际的leb count
@max_leb_cnt：文件系统允许使用的最大leb count
@log_lebs：log area 占用的逻辑LEB数目
@lpt_lebs：lprops table占用的逻辑LEB数目
@orph_lebs：orphan area占用的逻辑LEB数目
@jhead_cnt：journal head记录下一个节点写到flash上的位置，UBIFS采用多线程journal来写入两种主要的heads：base head和data head．jhead_cnt记录这个数目．
@fanout：ubifs文件系统树的最大扇出数．
@lsave_cnt：LPT save table中LEB Numbers的数目，LEB saved table用来在mount时加快查找LPT中空闲eraseblocks的速度．
@fmt_version：UBIFS on flash format version．

superblock几乎不改变，只有一种情况会导致superblock
node被重写，就是自动resize时。之所以需要自动resize，是因为创建ubifs文件系统镜像时，并不知道将要mount的UBI
bolume的大小，所以当我们将UBIFS镜像安装到UBI上时，UBI的尺寸可能实际上小于UBIFS镜像所需要的最大空间，此时就需要把UBIFS
resize以适合UBI volume。

Master area
占用LEB1 LEB2．一般情况下，这两个LEBs保存着相同数据，master node尺寸为512 bytes，每次写入master
node会顺序的使用LEB的空闲page，直到没有空闲page时，再从offset zero开始写master
node，这时会重新unmapped LEBs为另一个erased LEB．
注意，master node不会同时unmapped两个LEBs，因为这会导致文件系统没有有效master node，如果此时掉电，系统无法找到有效master node．

master node磁盘数据结构如下
/**
* struct ubifs_mst_node - master node.
* @ch: common header
* @highest_inum: highest inode number in the committed index
* @cmt_no: commit number
* @flags: various flags (%UBIFS_MST_DIRTY, etc)
* @log_lnum: start of the log
* @root_lnum: LEB number of the root indexing node
* @root_offs: offset within @root_lnum
* @root_len: root indexing node length
* @gc_lnum: LEB reserved for garbage collection (%-1 value means the LEB was not reserved and should be reserved on mount)
* @ihead_lnum: LEB number of index head
* @ihead_offs: offset of index head
* @index_size: size of index on flash
* @total_free: total free space in bytes
* @total_dirty: total dirty space in bytes
* @total_used: total used space in bytes (includes only data LEBs)
* @total_dead: total dead space in bytes (includes only data LEBs)
* @total_dark: total dark space in bytes (includes only data LEBs)
* @lpt_lnum: LEB number of LPT root nnode
* @lpt_offs: offset of LPT root nnode
* @nhead_lnum: LEB number of LPT head
* @nhead_offs: offset of LPT head
* @ltab_lnum: LEB number of LPT's own lprops table
* @ltab_offs: offset of LPT's own lprops table
* @lsave_lnum: LEB number of LPT's save table (big model only)
* @lsave_offs: offset of LPT's save table (big model only)
* @lscan_lnum: LEB number of last LPT scan
* @empty_lebs: number of empty logical eraseblocks
* @idx_lebs: number of indexing logical eraseblocks
* @leb_cnt: count of LEBs used by file-system
* @padding: reserved for future, zeroes
*/
struct ubifs_mst_node {
    struct ubifs_ch ch;
    __le64 highest_inum;
    __le64 cmt_no;
    __le32 flags;
    __le32 log_lnum;
    __le32 root_lnum;
    __le32 root_offs;
    __le32 root_len;
    __le32 gc_lnum;
    __le32 ihead_lnum;
    __le32 ihead_offs;
    __le64 index_size;
    __le64 total_free;
    __le64 total_dirty;
    __le64 total_used;
    __le64 total_dead;
    __le64 total_dark;
    __le32 lpt_lnum;
    __le32 lpt_offs;
    __le32 nhead_lnum;
    __le32 nhead_offs;
    __le32 ltab_lnum;
    __le32 ltab_offs;
    __le32 lsave_lnum;
    __le32 lsave_offs;
    __le32 lscan_lnum;
    __le32 empty_lebs;
    __le32 idx_lebs;
    __le32 leb_cnt;
    __u8 padding[344];
} __attribute__ ((packed));
master node大小为512 byter，和superblock node padding 为4096是不同的，这样可以更有效的利用master LEB．

@highest_inum: 当前系统最高的inode number，新创建的inode
number就是以这个作为基础的，ubifs的inode number是不能复用的，也就是说已经使用过的inode
number，将不能再使用，这是基于ubifs文件系统生命周期内，inode number不会超过0xffff0000这样的假设
@cmt_no:
@flags:
@log_lnum:
@root_lnum: LEB number of the root indexing node
@root_offs: offset within @root_lnum
@root_len: root indexing node length
@gc_lnum: LEB reserved for garbage collection (%-1 value means the LEB was not reserved and should be reserved on mount)
@ihead_lnum: LEB number of index head
@ihead_offs: offset of index head
@index_size: size of index on flash
@total_free: total free space in bytes
@total_dirty: total dirty space in bytes
@total_used: total used space in bytes (includes only data LEBs)
@total_dead: total dead space in bytes (includes only data LEBs)
@total_dark: total dark space in bytes (includes only data LEBs)
@lpt_lnum: LEB number of LPT root nnode
@lpt_offs: offset of LPT root nnode
@nhead_lnum: LEB number of LPT head
@nhead_offs: offset of LPT head
@ltab_lnum: LEB number of LPT's own lprops table，ubifs假定ltab可以保存在一个LEB内．
@ltab_offs: offset of LPT's own lprops table
@lsave_lnum: LEB number of LPT's save table (big model only)
@lsave_offs: offset of LPT's save table (big model only)
@lscan_lnum: LEB number of last LPT scan
@empty_lebs: number of empty logical eraseblocks
@idx_lebs: number of indexing logical eraseblocks
@leb_cnt: count of LEBs used by file-system
@padding: reserved for future, zeroes

log area

log 比较复杂，暂时掠过。

LPT area - LEB Properties Tree
LPT area 包含LEB Properties树，LPT area eraseblock表(ltab)，以及saved LEB numbers表(lsave)．LPT在log area和orphan area之间．

LPT area的大小在文件系统创建时就已经确定了，通过LEB 尺寸和文件系统最大LEB count自动计算出LPT area占用的LEB数目。

LPT area类似一个小型的自包含文件系统，它有自己的LEB properties，也就是LEB properties area的LEB
properties(ltab)．LPT area要求不能耗光自己的空间，能够快速访问和update，以及算法上的可扩展性．LPT
properties tree是用wandering tree实现的，LPT area有自己的垃圾收集器。

LPT有两种稍微不同的形式：small model和big model

在整个LEB properties table可以写在一块LEB上时，使用small
model，垃圾收集会写整个表，因此使得所有其他的eraseblock都变得可用．而对于big model，垃圾收集仅仅选择dirty
eraseblock，垃圾收集标记这个LEB上的clean node做为dirty，然后仅仅dirty
nodes被write-out．此外，在big model下，保存empty LEB
number使得UBIFS在第一次mount时，不需要扫描整个LPT来获取空的eraseblock。

main area内LEBs可以通过LEB properties三个值来标识，这三个值分别为：
1. free space
空闲空间是一个eraseblock后面还没有被写的字节数目
2. dirty space
是废弃nodes和padding占据的字节数，有可能被GC回收的部分．
3. 这个eraseblock是否为index eraseblock．
index nodes和non-index nodes是不可能放在同一个eraseblock中的，也就是说一个index eraseblock只包含index nodes，而non-index eraseblock仅包含non-index nodes．

pcnt：记录pnode number

lprops on-flash结构
| free space | dirty space | flags |
flags标识LEB是否包含index nodes

nbranch on-flash结构
| LEB number | LEB offs |

ltab-n on-flash结构
| free space | dirty space |

lsave - LPT save table, for quickly find empty eraseblock
| CRC | node type | lsave-1 | lsave-2 | ... | lsave-n |

orphan area
记录已经删除的文件的inode number．orphan area的意义在于删除过程unclean
unmount发生，已经删除的孤儿inodes必须被删除，这就要求扫描整个index来查找他们，或者在某处保存一个列表，ubifs就是在orphan
area保存这样一个列表

main area
保存文件系统index node和non-index node
UBIFS包含几种类型的non-index节点：file inode, directory entry，extend attribute entry和file data node
UBIFS维护着一棵wandering tree，叶子节点保存着文件信息，他们是文件系统的有效节点。树的内部节点是index node保存着到children的索引。所以wandering tree可以视为两
个部分，顶部保存树结构的索引nodes，底部则是真正文件数据的leaf node.

文件inode node on-flash结构：
/**
* struct ubifs_ino_node - inode node.
* @ch: common header
* @key: node key
* @creat_sqnum: sequence number at time of creation
* @size: inode size in bytes (amount of uncompressed data)
* @atime_sec: access time seconds
* @ctime_sec: creation time seconds
* @mtime_sec: modification time seconds
* @atime_nsec: access time nanoseconds
* @ctime_nsec: creation time nanoseconds
* @mtime_nsec: modification time nanoseconds
* @nlink: number of hard links
* @uid: owner ID
* @gid: group ID
* @mode: access flags
* @flags: per-inode flags (%UBIFS_COMPR_FL, %UBIFS_SYNC_FL, etc)
* @data_len: inode data length
* @xattr_cnt: count of extended attributes this inode has
* @xattr_size: summarized size of all extended attributes in bytes
* @padding1: reserved for future, zeroes
* @xattr_names: sum of lengths of all extended attribute names belonging to
*               this inode
* @compr_type: compression type used for this inode
* @padding2: reserved for future, zeroes
* @data: data attached to the inode
*
* Note, even though inode compression type is defined by @compr_type, some
* nodes of this inode may be compressed with different compressor - this
* happens if compression type is changed while the inode already has data
* nodes. But @compr_type will be use for further writes to the inode.
*
* Note, do not forget to amend 'zero_ino_node_unused()' function when changing
* the padding fields.
*/
struct ubifs_ino_node {
    struct ubifs_ch ch;
    __u8 key[UBIFS_MAX_KEY_LEN];
    __le64 creat_sqnum;
    __le64 size;
    __le64 atime_sec;
    __le64 ctime_sec;
    __le64 mtime_sec;
    __le32 atime_nsec;
    __le32 ctime_nsec;
    __le32 mtime_nsec;
    __le32 nlink;
    __le32 uid;
    __le32 gid;
    __le32 mode;
    __le32 flags;
    __le32 data_len;
    __le32 xattr_cnt;
    __le32 xattr_size;
    __u8 padding1[4]; /* Watch 'zero_ino_node_unused()' if changing! */
    __le32 xattr_names;
    __le16 compr_type;
    __u8 padding2[26]; /* Watch 'zero_ino_node_unused()' if changing! */
    __u8 data[];
} __attribute__ ((packed));

@size：文件的数据长度（未压缩前）
@data_len: inode data长度
@data:
磁盘inode节点附带的数据，这个field对于不同类型的文件，存储不同的数据。对于REG文件，可用来保存xattr；对于SYMLINK，用来存储符号连接；对于字符或者块设备>节点，用来存储主从设备号。注意ubifs
on-flash inode本身并没有像ext2/ext3那样，把普通文件数据的索引信息放在磁盘inode中。

既然无法通过ubifs_ino_node找到文件数据的索引信息，那么ubifs是如何读取一个文件的数据的呢？这就是UBIFS的索引树的作用，一个文件的每一片数据，都是一个数据节点，>可以通过这片数据对应的key在ubifs
index树中找到数据节点的具体位置。

目录entry node on-flash结构：
/**
* struct ubifs_dent_node - directory entry node.
* @ch: common header
* @key: node key
* @inum: target inode number
* @padding1: reserved for future, zeroes
* @type: type of the target inode (%UBIFS_ITYPE_REG, %UBIFS_ITYPE_DIR, etc)
* @nlen: name length
* @padding2: reserved for future, zeroes
* @name: zero-terminated name
*
* Note, do not forget to amend 'zero_dent_node_unused()' function when
* changing the padding fields.
*/
struct ubifs_dent_node {
    struct ubifs_ch ch;
    __u8 key[UBIFS_MAX_KEY_LEN];
    __le64 inum;
    __u8 padding1;
    __u8 type;
    __le16 nlen;
    __u8 padding2[4]; /* Watch 'zero_dent_node_unused()' if changing! */
    __u8 name[];
} __attribute__ ((packed));

@key: node key, | parent ino(32 bits) | key type(3 bits) | hash value(29 bits) |
@inum：目录项对应文件的inode number
@type: 目录项对应文件的类型
@nlen: 目录项名称的长度
@name: 目录项名称
ubifs目录项看起来和其他文件系统并没有什么区别，但是目录项操作的实现却比较独特，这是因为UBIFS某个目录下的目录项并不是集中存放的，UBIFS使用index树来管理目录项>，目录项的查找先通过key(由父inode
number + 目录项名字hash来实现)在index树查找，然后再比对路径解决冲突问题。
UBIFS的readdir实现也很奇技淫巧，UBIFS首先找到key最小的目录项，然后找到下一个，依次进行，遍历所有的目录项。这个传统文件系统的readdir是完全不同的。
ubifs_readdir的注释中也提到了ubifs使用directory entry hash value作为directory offsets，因此seekdir和telldir是无法正常工作的，当然，大多数情况下，这没什么问题
。

data node on-flash 结构
/**
* struct ubifs_data_node - data node.
* @ch: common header
* @key: node key
* @size: uncompressed data size in bytes
* @compr_type: compression type (%UBIFS_COMPR_NONE, %UBIFS_COMPR_LZO, etc)
* @padding: reserved for future, zeroes
* @data: data
*
* Note, do not forget to amend 'zero_data_node_unused()' function when
* changing the padding fields.
*/
struct ubifs_data_node {
    struct ubifs_ch ch;
    __u8 key[UBIFS_MAX_KEY_LEN];
    __le32 size;
    __le16 compr_type;
    __u8 padding[2]; /* Watch 'zero_data_node_unused()' if changing! */
    __u8 data[];
} __attribute__ ((packed));

@key: | ino(32 bits) | key type(3 bits) | block number(29 bits) |
@size: 数据节点的未压缩尺寸（真实数据尺寸是ch.len - sizeof(struct ubifs_data_node)）
@data: 压缩的数据

ubifs_data_node是ubifs文件数据的载体，ubifs不存在传统文件系统的数据索引，对数据的访问，需要首先生成待访问数据所对应节点的key，然后根据这个key到UBIFS wandering 树中找到这个ubifs_data_node。

UBIFS 文件系统分析1 - 磁盘结构【转】的更多相关文章

UBIFS文件系统介绍
1. 引言 UBIFS,Unsorted Block Image File System,无排序区块图像文件系统.它是用于固态硬盘存储设备上,并与LogFS相互竞争,作为JFFS2的后继文件系统之一 ...
proc文件系统分析
来源: ChinaUnix博客日期: 2008.01.03 11:46 (共有条评论) 我要评论二 proc文件系统分析根据前面的分析,我们可以基本确定对proc文件系统的分析步骤.我将按 ...
[转+整理]LINUX学习笔记（1）：磁盘结构及分区
整理自: http://vbird.dic.ksu.edu.tw/linux_basic/0130designlinux_2.php http://lengjianxxxx.blog.163.com/ ...
linux 磁盘管理三部曲——（1）磁盘结构，认识分区
最近小编整理了磁盘管理的相关知识,发现还是挺多的,所有就分了三个部分来给大家分享一下: 1.磁盘结构,认识分区 2.管理分区,文件系统格式化 3.mount挂载,/etc/fstab配置文件这篇就先 ...
linux常用命令大全2--挂载/dpkg/文件系统分析/apt/光盘/关机
挂载一个文件系统 mount /dev/hda2 /mnt/hda2 挂载一个叫做hda2的盘 - 确定目录 '/ mnt/hda2' 已经存在 umount /dev/hda2 卸载一个叫做hda2 ...
Windows存储管理之磁盘结构详解
Windows磁盘结构: Windows的主流磁盘结构分为MBR和GPT两种.MBR是早期Windows的唯一选择,但是随着物理磁盘的容量不断增大.GPT结构成为目前的主流,最大支持超过2TB的容量, ...
du 命令，对文件和目录磁盘使用的空间的查看
Linux du命令也是查看使用空间的,但是与df命令不同的是Linux du命令是对文件和目录磁盘使用的空间的查看,还是和df命令有一些区别的. 1．命令格式: du [选项][文件] 2．命令功能 ...
Linux命令（17）du 查看文件和目录磁盘使用情况
Linux du命令也是查看使用空间的,但是与df命令不同的是Linux du命令是对文件和目录磁盘使用的空间的查看,还是和df命令有一些区别的. 1．命令格式: du [选项][文件] 2．命令功能 ...
IIS:错误: 无法提交配置更改，因为文件已在磁盘上更改
文件名: \\?\C:\Windows\system32\inetsrv\config\applicationHost.config 错误: 无法提交配置更改,因为文件已在磁盘上更改通过 Micro ...

随机推荐

在MySql 5.0 的表里同时添加两个自动更新的timestamp字段
create table user_info (user_id int primary key auto_increment, register_time timestamp not null DEF ...
单例 (JAVA)
java中单例模式是一种常见的设计模式,以下是它的特点: 单例类只能有一个实例. 单例类必须自己创建自己的唯一实例. 单例类必须给所有其他对象提供这一实例第一种(懒汉,线程不安全): 1 publ ...
【android studio】解决android studio drawable新建项目时只有一个drawable目录的问题
概述 android studio默认新建Module时,只新建一个drawable目录,并不会新建适配不同分辨率的drawable目录.但其实,这是可以设置的.有以下两种方法: 方法1 详细步骤进 ...
[转载]CRect::DeflateRect
1基本内容 void DeflateRect(int x,int y); void DeflateRect(SIZE size); void DeflateRect(LPCRECT lpRect); ...
Linux中下载、解压、安装文件
一.将解压包发送到linux服务器上: 1.在windos上下载好压缩包文件后,通过winscp等SFTP客户端传送给linux 2.在linux中通过wget命令直接下载 #wget [选项] [下 ...
示例Oracle 10.2.0.1.0升级到10.2.0.4.0一例
1.查看当前系统版本 [oracle@std Disk1]$ sqlplus '/as sysdba' SQL*Plus: Release - Production on Thu Jan :: Cop ...
面试之Java多线程
Java多线程1.什么是多线程2.为什么需要多线程有什么优点和缺点3.怎么运行一.多线程是在软件或硬件上并发执行的技术共享数据空间,内存资源和CPU二.优点:把长时间运行的程序任务放到后台处理, ...
MVC概念性的内容
MVC: 是一个缩写(model + view + control), Model:是一些类文件, 功能:负责增删改查, 负责跟数据库打交道 (把数据存入到数据库: 从数据库把数据读 ...
HTTP Status 500 - org.apache.jasper.JasperException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException
HTTP Status 500 - org.apache.jasper.JasperException: com.sun.org.apache.xerces.internal.impl.io.Malf ...
【转】统计模型-n元文法
在谈N-Gram模型之前,我们先来看一下Mrkove假设: 1.一个词的出现仅仅依赖于它前面出现的有限的一个或者几个词: 2.一个词出现的概率条件地依赖于前N-1个词的词类. 定义 N-Gram是大词 ...

UBIFS 文件系统分析1 - 磁盘结构【转】

UBIFS 文件系统分析1 - 磁盘结构【转】的更多相关文章

随机推荐

热门专题