HDFS中的File由Block组成,一个File包含一个或多个Block,当创建File时会创建一个Block,然后根据配置的副本数量(默认是3)申请3个Datanode来存放这个Block;

通过hdfs fsck命令可以查看一个文件具体的Block、Datanode、Rack信息,例如:

hdfs fsck /tmp/test.sql -files -blocks -locations -racks
Connecting to namenode via http://name_node:50070
FSCK started by hadoop (auth:SIMPLE) from /client for path /tmp/test.sql at Thu Dec 13 15:44:12 CST 2018
/tmp/test.sql 16 bytes, 1 block(s): OK
0. BP-436366437-name_node-1493982655699:blk_1449692331_378721485 len=16 repl=3 [/DEFAULT/server111:50010, /DEFAULT/server121:50010, /DEFAULT/server43:50010]

Status: HEALTHY
Total size: 16 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 16 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 193
Number of racks: 1
FSCK ended at Thu Dec 13 15:44:12 CST 2018 in 1 milliseconds

The filesystem under path '/tmp/test.sql' is HEALTHY

那3个Datanode是如何选择出来的?有一个优先级:

1 当前机架(相对hdfs client而言)

2 远程机架(相对hdfs client而言)

3 另一机架

4 全部随机

然后每个机架能选择几个Datanode(即maxNodesPerRack)有一个计算公式,详见代码

org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer

    private int findNewDatanode(final DatanodeInfo[] original

        ) throws IOException {

      if (nodes.length != original.length + 1) {

        throw new IOException(

            new StringBuilder()

            .append("Failed to replace a bad datanode on the existing pipeline ")

            .append("due to no more good datanodes being available to try. ")

            .append("(Nodes: current=").append(Arrays.asList(nodes))

            .append(", original=").append(Arrays.asList(original)).append("). ")

            .append("The current failed datanode replacement policy is ")

            .append(dfsClient.dtpReplaceDatanodeOnFailure).append(", and ")

            .append("a client may configure this via '")

            .append(DFSConfigKeys.DFS_CLIENT_WRITE_REPLACE_DATANODE_ON_FAILURE_POLICY_KEY)

            .append("' in its configuration.")

            .toString());

      }

注释:当没有找到新的datanode时会报异常,报错如下:

Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[server82:50010], original=[server.82:50010]).
The current failed datanode replacement policy is ALWAYS, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

    private void addDatanode2ExistingPipeline() throws IOException {

...

      final DatanodeInfo[] original = nodes;

      final LocatedBlock lb = dfsClient.namenode.getAdditionalDatanode(

          src, fileId, block, nodes, storageIDs,

          failed.toArray(new DatanodeInfo[failed.size()]),

          1, dfsClient.clientName);

      setPipeline(lb);

      //find the new datanode

      final int d = findNewDatanode(original);

注释:会调用getAdditionalDatanode方法来获取1个新的datanode,此处略去很多调用堆栈

org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault

  private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,

                                    Node writer,

                                    List<DatanodeStorageInfo> chosenStorage,

                                    boolean returnChosenNodes,

                                    Set<Node> excludedNodes,

                                    long blocksize,

                                    final BlockStoragePolicy storagePolicy) {

...

    int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);

    numOfReplicas = result[0];

    int maxNodesPerRack = result[1];

...

    final Node localNode = chooseTarget(numOfReplicas, writer, excludedNodes,

        blocksize, maxNodesPerRack, results, avoidStaleNodes, storagePolicy,

        EnumSet.noneOf(StorageType.class), results.isEmpty());

注释:此处maxNodesPerRack表示每个机架最多只能分配几个datanode

  private Node chooseTarget(int numOfReplicas,

                            Node writer,

                            final Set<Node> excludedNodes,

                            final long blocksize,

                            final int maxNodesPerRack,

                            final List<DatanodeStorageInfo> results,

                            final boolean avoidStaleNodes,

                            final BlockStoragePolicy storagePolicy,

                            final EnumSet<StorageType> unavailableStorages,

                            final boolean newBlock) {

...

      if (numOfResults <= 1) {

        chooseRemoteRack(1, dn0, excludedNodes, blocksize, maxNodesPerRack,

            results, avoidStaleNodes, storageTypes);

        if (--numOfReplicas == 0) {

          return writer;

        }

      }

注释:此处会尝试在远程机架(即与已有的datanode不同的机架)获取一个新的datanode

  protected void chooseRemoteRack(int numOfReplicas,

                                DatanodeDescriptor localMachine,

                                Set<Node> excludedNodes,

                                long blocksize,

                                int maxReplicasPerRack,

                                List<DatanodeStorageInfo> results,

                                boolean avoidStaleNodes,

                                EnumMap<StorageType, Integer> storageTypes)

                                    throws NotEnoughReplicasException {

...

      chooseRandom(numOfReplicas, "~" + localMachine.getNetworkLocation(),

          excludedNodes, blocksize, maxReplicasPerRack, results,

          avoidStaleNodes, storageTypes);

注释:此处会在所有可选的datanode中随机选择一个

  protected DatanodeStorageInfo chooseRandom(int numOfReplicas,

                            String scope,

                            Set<Node> excludedNodes,

                            long blocksize,

                            int maxNodesPerRack,

                            List<DatanodeStorageInfo> results,

                            boolean avoidStaleNodes,

                            EnumMap<StorageType, Integer> storageTypes)

                            throws NotEnoughReplicasException {

...

    int numOfAvailableNodes = clusterMap.countNumOfAvailableNodes(

        scope, excludedNodes);

...

    if (numOfReplicas>0) {

      String detail = enableDebugLogging;

      if (LOG.isDebugEnabled()) {

        if (badTarget && builder != null) {

          detail = builder.toString();

          builder.setLength(0);

        } else {

          detail = "";

        }

      }

      throw new NotEnoughReplicasException(detail);

    }

注释:如果由于一些原因(比如节点磁盘满或者下线),导致numOfAvailableNodes计算结果为0,会抛出NotEnoughReplicasException

其中maxNodesPerRack计算逻辑如下:

org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault

  /**

   * Calculate the maximum number of replicas to allocate per rack. It also

   * limits the total number of replicas to the total number of nodes in the

   * cluster. Caller should adjust the replica count to the return value.

   *

   * @param numOfChosen The number of already chosen nodes.

   * @param numOfReplicas The number of additional nodes to allocate.

   * @return integer array. Index 0: The number of nodes allowed to allocate

   *         in addition to already chosen nodes.

   *         Index 1: The maximum allowed number of nodes per rack. This

   *         is independent of the number of chosen nodes, as it is calculated

   *         using the target number of replicas.

   */

  private int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {

    int clusterSize = clusterMap.getNumOfLeaves();

    int totalNumOfReplicas = numOfChosen + numOfReplicas;

    if (totalNumOfReplicas > clusterSize) {

      numOfReplicas -= (totalNumOfReplicas-clusterSize);

      totalNumOfReplicas = clusterSize;

    }

    // No calculation needed when there is only one rack or picking one node.

    int numOfRacks = clusterMap.getNumOfRacks();

    if (numOfRacks == 1 || totalNumOfReplicas <= 1) {

      return new int[] {numOfReplicas, totalNumOfReplicas};

    }

    int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2;

    // At this point, there are more than one racks and more than one replicas

    // to store. Avoid all replicas being in the same rack.

    //

    // maxNodesPerRack has the following properties at this stage.

    //   1) maxNodesPerRack >= 2

    //   2) (maxNodesPerRack-1) * numOfRacks > totalNumOfReplicas

    //          when numOfRacks > 1

    //

    // Thus, the following adjustment will still result in a value that forces

    // multi-rack allocation and gives enough number of total nodes.

    if (maxNodesPerRack == totalNumOfReplicas) {

      maxNodesPerRack--;

    }

    return new int[] {numOfReplicas, maxNodesPerRack};

  }

注释:

int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2;

if (maxNodesPerRack == totalNumOfReplicas) {

maxNodesPerRack--;

}

【原创】大数据基础之HDFS(1)HDFS新创建文件如何分配Datanode的更多相关文章

  1. 大数据学习(一)-------- HDFS

    需要精通java开发,有一定linux基础. 1.简介 大数据就是对海量数据进行数据挖掘. 已经有了很多框架方便使用,常用的有hadoop,storm,spark,flink等,辅助框架hive,ka ...

  2. 【原创】大数据基础之Zookeeper(2)源代码解析

    核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...

  3. 【原创】大数据基础之Kerberos(2)hive impala hdfs访问

    1 hive # kadmin.local -q 'ktadd -k /tmp/hive3.keytab -norandkey hive/server03@TEST.COM'# kinit -kt / ...

  4. 大数据基础总结---HDFS分布式文件系统

    HDFS分布式文件系统 文件系统的基本概述 文件系统定义:文件系统是一种存储和组织计算机数据的方法,它使得对其访问和查找变得容易. 文件名:在文件系统中,文件名是用于定位存储位置. 元数据(Metad ...

  5. 大数据技术之Hadoop(HDFS)

    第1章 HDFS概述 1.1 HDFS产出背景及定义 1.2 HDFS优缺点 1.3 HDFS组成架构 1.4 HDFS文件块大小(面试重点) 第2章 HDFS的Shell操作(开发重点) 1.基本语 ...

  6. 大数据学习(02)——HDFS入门

    Hadoop模块 提到大数据,Hadoop是一个绕不开的话题,我们来看看Hadoop本身包含哪些模块. Common是基础模块,这个是必须用的.剩下常用的就是HDFS和YARN. MapReduce现 ...

  7. 【原创】大数据基础之Impala(1)简介、安装、使用

    impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...

  8. 大数据学习之旅1——HDFS版本演化

    最近开始学习大数据,发现大数据有很多很多组件,我现在负责的是HDFS(Hadoop分布式储存系统)的学习,整理了一下HDFS的版本情况.因为HDFS是Hadoop的重要组成部分,所以有关HDFS的版本 ...

  9. 大数据之路week07--day01(HDFS学习,Java代码操作HDFS,将HDFS文件内容存入到Mysql)

    一.HDFS概述 数据量越来越多,在一个操作系统管辖的范围存不下了,那么就分配到更多的操作系统管理的磁盘中,但是不方便管理和维护,因此迫切需要一种系统来管理多台机器上的文件,这就是分布式文件管理系统 ...

随机推荐

  1. 在Bootstrap开发框架的前端视图中使用@RenderPage实现页面内容模块化的隔离,减少复杂度

    在很多开发的场景中,很多情况下我们需要考虑抽象.以及模块化等方面的内容,其目的就是为了使得开发的时候关注的变化内容更加少一些,整体开发更加简单化,从而减少开发的复杂度,在Winform开发的时候,往往 ...

  2. final和static关键字

    1.如果类只有静态方法,可以将构造函数标记为private以避免被初始化: 2.常量同时标记为static和final,常量名全部大写,下划线连接: 3.final修饰一个成员变量(属性),必须要显示 ...

  3. Day5 Numerical simulation of optical wave propagation之通过随机介质(如大气湍流)的传播(一)

    一 分步光束传播方法 到目前为止,人们已经设计出传播算法,用于模拟通过真空和通过可用光线矩阵描述的简单光学系统的传播. 其中分步光束传播方法除了描述上述传播过程,还有更复杂的应用,包括:部分时间和空间 ...

  4. Flask Session 使用和源码分析 —— (6)

    基本使用 from flask import Flask, session, redirect, url_for, escape, request app = Flask(__name__) @app ...

  5. 采用VSPD、ModbusTool模拟串口、MODBUS TCP设备进行Python采集软件开发

    版权声明:本文为博主原创文章,欢迎转载,并请注明出处.联系方式:460356155@qq.com 不少仪器/设备都提供了数据采集的接口,其中不少是串口或网络的MODBUS/TCP协议. 串口是比较简单 ...

  6. HDU 3518 Boring counting

    题目:Boring counting 链接:http://acm.hdu.edu.cn/showproblem.php?pid=3518 题意:给一个字符串,问有多少子串出现过两次以上,重叠不能算两次 ...

  7. 如何破解加密了的word文档

    https://blog.csdn.net/huangbaokang/article/details/79630223 变成xml文件格式之后--查找在documentProtection前面加上un ...

  8. python yield 理解与用法

    1.一句话快速理解 yield 等于 return  这么简单理解 2.详细说明: yield和return的关系和区别了,带yield的函数是一个生成器,而不是一个函数了 这个生成器有一个函数就是n ...

  9. 【bfs】最少转弯问题

    题目描述 给出一张地图,这张地图被分为n×m(n,m<=100)个方块,任何一个方块不是平地就是高山.平地可以通过,高山则不能.现在你处在地图的(x1,y1)这块平地,问:你至少需要拐几个弯才能 ...

  10. 和CISSP并肩的信息安全认证国际注册信息安全经理CISM

    众所周知,信息安全认证界有一个扛把子的证书叫CISSP(国际信息安全专家认证),一般拥有CISSP证书的小哥哥还会选择考取另一个认证,这就是今天给大家介绍的CISM(国际注册信息安全经理).CISM是 ...