Overview

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

HDFS 快照是文件系一个时间点的只读的副本。快照可以是部分文件系统,或者整个文件系统。一些场景使用快照的场景是数据备份,防止用户误操作和灾难恢复。

The implementation of HDFS Snapshots is efficient:

  • Snapshot creation is instantaneous: the cost is O(1) excluding the inode lookup time.
  • Additional memory is used only when modifications are made relative to a snapshot: memory usage is O(M), where M is the number of modified files/directories.
  • Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
  • Snapshots do not adversely affect regular HDFS operations: modifications are recorded in reverse chronological order so that the current data can be accessed directly. The snapshot data is computed by subtracting the modifications from the current data.

使用HDFS 快照是高效的:

· 快照创建是瞬间的:成本是0(1)排除查找信息节点的时间 。

· 额外的内存使用仅仅当对快照进行修改时产生:内存使用时0(M),M是修改文件/目录的数量。

· 在datanode中的块不会被拷贝:快照文件记录这些块列表和文件大小。不会产生数据拷贝。

· 快照不会对日常的HDFS操作产生不利的影响:修改被按反向时间排序记录,这样当前数据可以直接的访问。快照数据是由当前数据减去修改数据计算出来的。

Snapshottable Directories

Snapshots can be taken on any directory once the directory has been set as snapshottable. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshottable directories. Administrators may set any directory to be snapshottable. If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.

快照可以产生在任何被设置为snapshottable的目录中。一个snapshottable目录可以同时容纳65536个快照。snapshottable目录没有个数上限,管理员可以设置任意个snapshottable。如果一个snapshottable中存在快照,那么这个目录在删除所有快照之前,不能删除或改名。

Nested snapshottable directories are currently not allowed. In other words, a directory cannot be set to snapshottable if one of its ancestors/descendants is a snapshottable directory.

嵌套的snapshottable目录在现在并不支持。换句话说,如果一个目录的父目录/子目录是一个snapshottable目录的话,那么其不能设置为snapshottable。

Snapshot Paths

For a snapshottable directory, the path component ".snapshot" is used for accessing its snapshots. Suppose /foo is a snapshottable directory, /foo/bar is a file/directory in /foo, and /foo has a snapshot s0. Then, the path

/foo/.snapshot/s0/bar

对于一个snapshottable目录,”.snapshot”组件有利于访问其快照。假设/foo是一个snapshottable目录,/foo/bar是 /foo中的一个文件/目录,/foo有一个快照s0,那么这个路径

/foo/.snapshot/s0/bar

refers to the snapshot copy of /foo/bar. The usual API and CLI can work with the ".snapshot" paths. The following are some examples.

列出一个snapshottable目录中所有的快照:关联到快照副本/foo/bar。一般的API和CLI都可以在”.snapshot”路径上工作。下面是一些例子

  • Listing all the snapshots under a snapshottable directory:
  • 列出一个snapshottable目录下所有的快照:

hdfs dfs -ls /foo/.snapshot

  • Listing the files in snapshot s0:
  • 列出在快照s0中的所有文件:

hdfs dfs -ls /foo/.snapshot/s0

  • Copying a file from snapshot s0:
  • copy一个文件从快照s0:

hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp

Note that this example uses the preserve option to preserve timestamps, ownership, permission, ACLs and XAttrs.

注意这个例子使用了保存选项来保存时间戳,所有权,权限,ACLS和XAttrs

Upgrading to a version of HDFS with snapshots

The HDFS snapshot feature introduces a new reserved path name used to interact with snapshots: .snapshot. When upgrading from an older version of HDFS, existing paths named .snapshot need to first be renamed or deleted to avoid conflicting with the reserved path. See the upgrade section in the HDFS user guide for more information.

HDFS快照特性引用了一个新的保留路径名,来进行快照交互:.snapshot。当HDFS从一个旧版本升级时,现存的路径名称.snapshot需要首先重命名或者删除,来避免保留路径的冲突。更多详细类容,参考HDFS用户指南升级部分。

Snapshot Operations

Administrator Operations

The operations described in this section require superuser privilege.

本节中描述的操作需要超级用户权限

Allow Snapshots

Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.

允许一个快照目录被创建。如果这个操作成功完成,这个目录就变成snapshottable

  • Command(命令):

hdfs dfsadmin -allowSnapshot <path>

  • Arguments(参数):

path

The path of the snapshottable directory.

See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin.

也可以参考Hdfsadmin中相关JAVA API void allowSnapshot(Path path)。

Disallow Snapshots

Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.

禁止快照目录创建。在禁止快照之前目录中的所有快照必须删除。

  • Command(命令):

hdfs dfsadmin -disallowSnapshot <path>

  • Arguments(参数):

path

The path of the snapshottable directory.

See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin.

也可以参考Hdfsadmin中相关JAVA API void disallowSnapshot(Path path)。

User Operations

The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations.

本节介绍用户操作。注意HDFS超级用户,可以执行除了个人操作需要满足的安全权限之外的所有操作。

Create Snapshots

Create a snapshot of a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

在snapshottable目录中创建一个一个快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command(命令):

hdfs dfs -createSnapshot <path> [<snapshotName>]

  • Arguments(参数):

path

The path of the snapshottable directory.

snapshotName

The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033".

See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods.

也可以参考文件系统中相关JAVA API Path createSanpshot(Path path)和Path createSnapshot(Path path,String snapshotName)。在这些方法中返回了快照路径。

Delete Snapshots

Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

从一个snapshottable目录中删除快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command:

hdfs dfs -deleteSnapshot <path> <snapshotName>

  • Arguments:

path

The path of the snapshottable directory.

snapshotName

The snapshot name.

See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem.

Rename Snapshots

Rename a snapshot. This operation requires owner privilege of the snapshottable directory.

重命名一个快照。这个操作需要拥有snapshottabl目录所有者权限。

  • Command:

hdfs dfs -renameSnapshot <path> <oldName> <newName>

  • Arguments:

path

The path of the snapshottable directory.

oldName

The old snapshot name.

newName

The new snapshot name.

See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem.

也可以参考文件系统中相关JAVA API void renameSnapshot(Path path, String oldName, String newName)

Get Snapshottable Directory Listing

Get all the snapshottable directories where the current user has permission to take snapshtos.

获得当前用户有权限产生快照的所有snapshottabl目录

  • Command:

hdfs lsSnapshottableDir

  • Arguments: none

See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem.

也可以参考分布式文件系统中相关JAVA API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()。

Get Snapshots Difference Report

Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots.

在2个快照之间获得差异。这个操作需要在2个快照中,所有文件/目录的读和访问权限。

  • Command:

hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

  • Arguments:

path

The path of the snapshottable directory.

fromSnapshot

The name of the starting snapshot.

toSnapshot

The name of the ending snapshot.

  • Results:

+

The file/directory has been created.

-

The file/directory has been deleted.

M

The file/directory has been modified.

R

The file/directory has been renamed.

A RENAME entry indicates a file/directory has been renamed but is still under the same snapshottable directory. A file/directory is reported as deleted if it was renamed to outside of the snapshottble directory. A file/directory renamed from outside of the snapshottble directory is reported as newly created.

一个RENAME提示一个文件/目录被重命名,但是仍然存在相同的snapshottabl目录中。如果一个文件/目录被重命名到snapshottabl目录外,那么会打印为删除。从snapshottabl目录之外重命名进来的文件/目录,被打印为新创建。

The snapshot difference report does not guarantee the same operation sequence. For example, if we rename the directory "/foo" to "/foo2", and then append new data to the file "/foo2/bar", the difference report will be:

快照差异报告不能保证相同操作的顺序。例如,如果我们将目录”/foo”重命名为”/foo2″,然后增加一个新文件为”/foo2/bar”,这个差异报告将是:

R. /foo -> /foo2

M. /foo/bar

I.e., the changes on the files/directories under a renamed directory is reported using the original path before the rename ("/foo/bar" in the above example).

即,在一个目录重命名下的文件/目录 变更,在报告的时候,是使用原来未重命名之前的名称。(例如上面的”/foo/bar”)

See also the corresponding Java API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot) in DistributedFileSystem.

也可以参考分布式文件系统中相关JAVA API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)。

HDFS Snapshots的更多相关文章

  1. [HDFS Manual] CH8 HDFS Snapshots

    HDFS Snapshots HDFS Snapshots 1. 概述 1.1 Snapshottable目录 1.2 快照路径 2. 带快照的更新 3. 快照操作 3.1 管理操作 3.2 用户操作 ...

  2. 四:HDFS Snapshots

    1.介绍 HDFS快照保存某个时间点的文件系统快照,可以是部分的文件系统,也可以是全部的文件系统.快照用来做数据备份和灾备.有以下特点: 1.快照几乎是实时瞬间完成的 2.只有在做快照时文件系统有修改 ...

  3. Hadoop 2.x HDFS新特性

    Hadoop 2.x HDFS新特性 1.HDFS联邦    2. HDFS HA(要用到zookeeper等,留在后面再讲)    3.HDFS快照 回顾: HDFS两层模型     Namespa ...

  4. HDFS笔记——技术点汇总

    目录 · 概况 · 原理 · HDFS 架构 · 块 · NameNode · SecondaryNameNode · fsimage与edits合并 · DataNode · 数据读写 · 容错机制 ...

  5. 【转载 Hadoop&Spark 动手实践 2】Hadoop2.7.3 HDFS理论与动手实践

    简介 HDFS(Hadoop Distributed File System )Hadoop分布式文件系统.是根据google发表的论文翻版的.论文为GFS(Google File System)Go ...

  6. HDFS 命令大全

    目录 概要 用户命令 dfs 命令 追加文件内容 查看文件内容 得到文件的校验信息 修改用户组 修改文件权限 修改文件所属用户 本地拷贝到 hdfs hdfs 拷贝到本地 获取目录,文件数量及大小 h ...

  7. Hadoop学习笔记—HDFS

    目录 搭建安装 三个核心组件 安装 配置环境变量 配置各上述三组件守护进程的相关属性 启停 监控和性能 Hadoop Rack Awareness yarn的NodeManagers监控 命令 hdf ...

  8. 从零自学Hadoop(10):Hadoop1.x与Hadoop2.x

    阅读目录 序 里程碑 Hadoop1.x与Hadoop2.x 系列索引 本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephisto)写的 ...

  9. 从零自学Hadoop(11):Hadoop命令上

    阅读目录 序 概述 Hadoop Common Commands User Commands Administration Commands File System Shell 引用 系列索引 本文版 ...

随机推荐

  1. [AX2012]在SSRS报表中获取从Menuitem传入的记录

    在较早版本的AX中我们运行一个报表时会用到类RunBaseReport,从它扩展一个子类,再由它运行报表,一个典型的Axapta3中的例子: class ReportProdInfo extends ...

  2. vuejs使用FormData对象,ajax上传图片文件

    我相信很多使用vuejs的朋友,都有采用ajax上传图片的需求,因为前后端分离后,我们希望都能用ajax来解决数据问题,传统的表单提交会导致提交成功后页面跳转,而使用ajax能够无刷新上传图片等文件. ...

  3. flexbox常用布局左右固定,中间自适应

    <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name ...

  4. tomcat运行模式APR安装

    centos6.2下,Tomcat运行模式apr安装过程,如下: 一.安装apr [root@vmT227-m5 /]# cd /usr/local/ [root@vmT227-m5 local]# ...

  5. IIS URL Rewrite Module的防盗链规则设置

    IIS版本:IIS 7.5 URL Rewrite组件:IIS URL Rewrite Module(http://www.iis.net/downloads/microsoft/url-rewrit ...

  6. Qt生成ui文件对应的.h和.cpp文件

    在VS中,可以通过CMake设定QT5_WRAP_UI来编译a.ui到ui_a.h, 要想快速生成a.h和a.cpp,经过尝试,必须使用Qt Creator,否则就手写.

  7. Python学习笔记18-发送邮件

    SMTP是发送邮件的协议,Python内置对SMTP的支持,可以发送纯文本邮件.HTML邮件以及带附件的邮件. Python对SMTP支持有smtplib和email两个模块,email负责构造邮件, ...

  8. [Ubuntu] 如何在 Lubuntu 安装 python-spidermonkey

    SpiderMonkey 是由 Mozilla 开发的 Javascript 引擎,它由 C/C++ 编写而成.Mozilla 在其多个产品中使用了该引擎,包括 Firefox 浏览器. python ...

  9. 在apache虚拟目录配置

    在apache虚拟目录配置中 <VirtualHost *:80>xxx xxx xxx</VirtualHost> 不能写成 <VirtualHost *>xxx ...

  10. Windows 系统提示“内存不足”的原因及解决方法

         Windows 系统提示“内存不足”的原因及解决方法 windows XP vista 及windows 7系统的电脑有时候会出现系统提示“内存不足”,这是由多方面原因造成的.本文具体分析下 ...