执行start-dfs.sh脚本后，集群是如何启动的？

本文阅读并注释了start-dfs脚本，以及datanode的启动主要流程流程源码。

DataNode 启动流程

脚本代码分析

start-dfs.sh中启动datanode的代码：

#---------------------------------------------------------

# datanodes (using default workers file)

echo "Starting datanodes"

hadoop_uservar_su hdfs datanode "${HADOOP_HDFS_HOME}/bin/hdfs" \

    --workers \

    --config "${HADOOP_CONF_DIR}" \

    --daemon start \

    datanode ${dataStartOpt}

(( HADOOP_JUMBO_RETCOUNTER=HADOOP_JUMBO_RETCOUNTER + $? ))

去hadoop-hdfs > src > mian > bin > hdfs中查看namenode命令：

# 命令描述：用于启动DFS datanode

  hadoop_add_subcommand "datanode" daemon "run a DFS datanode"

# 命令处理程序

    datanode)

      HADOOP_SUBCMD_SUPPORTDAEMONIZATION="true"

      HADOOP_SECURE_CLASSNAME="org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter"

      HADOOP_CLASSNAME='org.apache.hadoop.hdfs.server.datanode.DataNode'

      hadoop_deprecate_envvar HADOOP_SECURE_DN_PID_DIR HADOOP_SECURE_PID_DIR

      hadoop_deprecate_envvar HADOOP_SECURE_DN_LOG_DIR HADOOP_SECURE_LOG_DIR

    ;;

这里定位到了具体的处理类org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter和org.apache.hadoop.hdfs.server.namenode.NameNode。

接着跟进脚本代码到hadoop-functions.sh中的hadoop_generic_java_subcmd_handler函数可以查看到以下代码：

  # do the hard work of launching a daemon or just executing our interactive

  #  是启动守护进程还是仅仅执行交互

  # java class

  if [[ "${HADOOP_SUBCMD_SUPPORTDAEMONIZATION}" = true ]]; then

    if [[ "${HADOOP_SUBCMD_SECURESERVICE}" = true ]]; then

      hadoop_secure_daemon_handler \

        "${HADOOP_DAEMON_MODE}" \

        "${HADOOP_SUBCMD}" \

        "${HADOOP_SECURE_CLASSNAME}" \

        "${daemon_pidfile}" \

        "${daemon_outfile}" \

        "${priv_pidfile}" \

        "${priv_outfile}" \

        "${priv_errfile}" \

        "${HADOOP_SUBCMD_ARGS[@]}"

    else

      hadoop_daemon_handler \

        "${HADOOP_DAEMON_MODE}" \

        "${HADOOP_SUBCMD}" \

        "${HADOOP_CLASSNAME}" \

        "${daemon_pidfile}" \

        "${daemon_outfile}" \

        "${HADOOP_SUBCMD_ARGS[@]}"

    fi

    exit $?

  else

    hadoop_java_exec "${HADOOP_SUBCMD}" "${HADOOP_CLASSNAME}" "${HADOOP_SUBCMD_ARGS[@]}"

  fi

这里需要分析一下最终走的是hadoop_secure_daemon_handler还是hadoop_daemon_handler。

在满足HADOOP_SUBCMD_SUPPORTDAEMONIZATION = true和HADOOP_SUBCMD_SECURESERVICE = true两个条件时才会进行安全模式启动。

HADOOP_SUBCMD_SUPPORTDAEMONIZATION在datanode的命令处理程序中会赋值：

# 在hdfs脚本中

datanode)

      HADOOP_SUBCMD_SUPPORTDAEMONIZATION="true"

         HADOOP_SECURE_CLASSNAME="org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter"

# ......

    ;;

HADOOP_SUBCMD_SECURESERVICE在hadoop-functions.sh脚本中定义的默认值为：

  HADOOP_SUBCMD_SECURESERVICE=false

在函数hadoop_generic_java_subcmd_handler（我们的脚本执行函数）中，有条件判断是否赋值为true：

## @description Handle subcommands from main program entries

## @audience private

## @stability evolving

## @replaceable yes

function hadoop_generic_java_subcmd_handler

{

# ......

  # The default/expected way to determine if a daemon is going to run in secure

  # mode is defined by hadoop_detect_priv_subcmd.  If this returns true

  # then setup the secure user var and tell the world we're in secure mode

  if hadoop_detect_priv_subcmd "${HADOOP_SHELL_EXECNAME}" "${HADOOP_SUBCMD}"; then

    HADOOP_SUBCMD_SECURESERVICE=true

# ......

进入hadoop_detect_priv_subcmd函数中：

## @description autodetect whether this is a priv subcmd

## @description by whether or not a priv user var exists

## @description and if HADOOP_SECURE_CLASSNAME is defined

## @audience     public

## @stability    stable

## @replaceable  yes

## @param        command

## @param        subcommand

## @return       1 = not priv

## @return       0 = priv

function hadoop_detect_priv_subcmd

{

  declare program=$1

  declare command=$2

  #

  if [[ -z "${HADOOP_SECURE_CLASSNAME}" ]]; then

    hadoop_debug "No secure classname defined."

    return 1

  fi

  uvar=$(hadoop_build_custom_subcmd_var "${program}" "${command}" SECURE_USER)

  if [[ -z "${!uvar}" ]]; then

    hadoop_debug "No secure user defined."

    return 1

  fi

  return 0

}

可以看到需要HADOOP_SECURE_CLASSNAME，和两个传入参数HADOOP_SHELL_EXECNAME,HADOOP_SUBCMD都存在的情况下才会返回0（在shell脚本中if function; then 格式，function返回0即会执行then后的语句）。

HADOOP_SECURE_CLASSNAME参数与HADOOP_SUBCMD_SUPPORTDAEMONIZATION相同会在hdfs脚本中的datanode的命令处理程序中赋值。

HADOOP_SHELL_EXECNAME参数在hdfs脚本中会定义默认值：

# The name of the script being executed.

HADOOP_SHELL_EXECNAME="hdfs"

HADOOP_SUBCMD参数在hdfs脚本中被定义为：HADOOP_SUBCMD=$1，即取自第二个参数，我们返回start-dfs.sh脚本中查看调用命令的完整语句如下：

#---------------------------------------------------------

# datanodes (using default workers file)

echo "Starting datanodes"

hadoop_uservar_su hdfs datanode "${HADOOP_HDFS_HOME}/bin/hdfs" \

    --workers \

    --config "${HADOOP_CONF_DIR}" \

    --daemon start \

    datanode ${dataStartOpt}

(( HADOOP_JUMBO_RETCOUNTER=HADOOP_JUMBO_RETCOUNTER + $? ))

第二个参数为workers。

所以可以得出，正常执行start-dfs.sh脚本的情况下，会默认值行hadoop_secure_daemon_handler函数，即通过执行SecureDataNodeStarter类来以安全模式启动datanode。

SecureDataNodeStarter

官方注释翻译:

在安全集群中启动datanode的实用程序类，首先在主启动前获得特权资源并将它们交给datanode。

SecureDataNodeStarter实现了Daemon，作为一个守护进程，我们先看它实现自Daemon的方法：

  @Override

  public void init(DaemonContext context) throws Exception {

    System.err.println("Initializing secure datanode resources");

    // 创建一个新的HdfsConfiguration对象，以确保选中hdfs-site.xml中的配置。

    Configuration conf = new HdfsConfiguration();

    // 存储常规datanode的命令行参数

    args = context.getArguments();

    // 初始化数据节点的特权资源(即特权端口)。

    resources = getSecureResources(conf);

  }

 @Override

  public void start() throws Exception {

    System.err.println("Starting regular datanode initialization");

    // 正常的初始化DataNode

    DataNode.secureMain(args, resources);

  }

  @Override public void destroy() {}

  @Override public void stop() throws Exception { /* Nothing to do */ }

静态变量

可以看到SecureDataNodeStarter主要作用就是获取配置信息并存储起来，然后正常的初始化DateNode时再作为参数传递。接下来看看除了命令行参数外都还初始化了哪些参数：

	// 命令行参数

  private String [] args;

  private SecureResources resources;

	// 在安全的环境中存储datanode操作所需的资源

	public static class SecureResources {

    // 是否启用sasl

    private final boolean isSaslEnabled;

    // rpc 端口是否为特权端口（端口号小于1024，不允许普通用户在其上运行服务器）

    // 详见https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html

    private final boolean isRpcPortPrivileged;

    // http 端口是否为特权端口

    private final boolean isHttpPortPrivileged;

  	// 监听dfs.datanode.address配置的端口的服务器套接字

    private final ServerSocket streamingSocket;

  	// 监听dfs.datanode.http.address配置的端口的服务器套接字通道

    private final ServerSocketChannel httpServerSocket;

    public SecureResources(ServerSocket streamingSocket, ServerSocketChannel

        httpServerSocket, boolean saslEnabled, boolean rpcPortPrivileged,

        boolean httpPortPrivileged) {

      this.streamingSocket = streamingSocket;

      this.httpServerSocket = httpServerSocket;

      this.isSaslEnabled = saslEnabled;

      this.isRpcPortPrivileged = rpcPortPrivileged;

      this.isHttpPortPrivileged = httpPortPrivileged;

    }

   // getter / setter .... 略

  }

getSecureResources(conf)

接下来看init()中调用的方法getSecureResources(conf)，看看SecureResources中的参数都是从哪获取的。

  //  获取数据节点的特权资源(即特权端口)。

  //  特权资源由RPC服务器的端口和HTTP(不是HTTPS)服务器的端口组成。

  @VisibleForTesting

  public static SecureResources getSecureResources(Configuration conf)

      throws Exception {

    // 获取http访问协议，HTTP_ONLY, HTTPS_ONLY, HTTP_AND_HTTPS

    HttpConfig.Policy policy = DFSUtil.getHttpPolicy(conf);

    // 尝试构建SaslPropertiesResolver，如果可以即为开启sasl

    boolean isSaslEnabled =

        DataTransferSaslUtil.getSaslPropertiesResolver(conf) != null;

    boolean isRpcPrivileged;

    boolean isHttpPrivileged = false;

    System.err.println("isSaslEnabled:" + isSaslEnabled);

    // 获取数据流到datanode的安全端口，创建IP套接字地址

    // 会通过配置项dfs.datanode.address来创建，配置的默认值为：0.0.0.0:9866

    InetSocketAddress streamingAddr  = DataNode.getStreamingAddr(conf);

    // 获取socket 写超时时间

    // 配置项为:dfs.datanode.socket.write.timeout, 默认值为：8 * 60 秒

    int socketWriteTimeout = conf.getInt(

        DFSConfigKeys.DFS_DATANODE_SOCKET_WRITE_TIMEOUT_KEY,

        HdfsConstants.WRITE_TIMEOUT);

    // 获取请求的传入连接队列的最大长度。

    // 配置项为ipc.server.listen.queue.size， 默认值为256

    int backlogLength = conf.getInt(

        CommonConfigurationKeysPublic.IPC_SERVER_LISTEN_QUEUE_SIZE_KEY,

        CommonConfigurationKeysPublic.IPC_SERVER_LISTEN_QUEUE_SIZE_DEFAULT);

    // 默认打开ServerSocketChannel进行datanode端口监听

    ServerSocket ss = (socketWriteTimeout > 0) ?

        ServerSocketChannel.open().socket() : new ServerSocket();

    try {

      // 绑定端口，设置请求的传入连接队列的最大长度

      ss.bind(streamingAddr, backlogLength);

    } catch (BindException e) {

      BindException newBe = appendMessageToBindException(e,

          streamingAddr.toString());

      throw newBe;

    }

    // 检查是否绑定到了正确

    if (ss.getLocalPort() != streamingAddr.getPort()) {

      throw new RuntimeException(

          "Unable to bind on specified streaming port in secure "

              + "context. Needed " + streamingAddr.getPort() + ", got "

              + ss.getLocalPort());

    }

    // 检查给定端口是否为特权端口。

    // 在unix/linux系统中，小于1024的端口被视为特权端口。

    // 对于其他操作系统，请谨慎使用此方法。

    // 例如，Windows没有特权端口的概念。

    // 但是，在Windows客户端上可以用来检查linux服务器的端口。

    isRpcPrivileged = SecurityUtil.isPrivilegedPort(ss.getLocalPort());

    System.err.println("Opened streaming server at " + streamingAddr);

    //  为web服务器绑定端口。

    //  该代码打算仅将HTTP服务器绑定到特权端口，因为如果服务器通过SSL进行通信，客户端可以使用证书对服务器进行身份验证。

    final ServerSocketChannel httpChannel;

    // 判断是否允许http访问

    if (policy.isHttpEnabled()) {

      httpChannel = ServerSocketChannel.open();

      // 确定http服务器的有效地址

      // 通过配置项dfs.datanode.http.address来生成，默认值为：0.0.0.0:9864

      InetSocketAddress infoSocAddr = DataNode.getInfoAddr(conf);

      try {

        httpChannel.socket().bind(infoSocAddr);

      } catch (BindException e) {

        BindException newBe = appendMessageToBindException(e,

            infoSocAddr.toString());

        throw newBe;

      }

      InetSocketAddress localAddr = (InetSocketAddress) httpChannel.socket()

        .getLocalSocketAddress();

      // 校验httpChannel绑定的地址是否正确

      if (localAddr.getPort() != infoSocAddr.getPort()) {

        throw new RuntimeException("Unable to bind on specified info port in " +

            "secure context. Needed " + infoSocAddr.getPort() + ", got " +

             ss.getLocalPort());

      }

      System.err.println("Successfully obtained privileged resources (streaming port = "

          + ss + " ) (http listener port = " + localAddr.getPort() +")");

      // 判断端口号是否为特权端口（小于1024）

      isHttpPrivileged = SecurityUtil.isPrivilegedPort(localAddr.getPort());

      System.err.println("Opened info server at " + infoSocAddr);

    } else {

      httpChannel = null;

    }

    // 将获取到的特权资源封装成SecureResources

    return new SecureResources(ss, httpChannel, isSaslEnabled,

        isRpcPrivileged, isHttpPrivileged);

  }

至此，SecureDataNodeStarter类的init()方法结束。

继续看start()方法，可以看到就是正常的传入init()方法中初始化的配置。

  @Override

  public void start() throws Exception {

    System.err.println("Starting regular datanode initialization");

    DataNode.secureMain(args, resources);

  }

resources参数在datanode中的具体作用见datanode代码分析

DataNode

dataNode官方注释反应如下：

DataNode是一个类(和程序)，它为DFS部署存储一组块。

单个部署可以有一个或多个datanode。

每个DataNode定期与单个NameNode通信。

它还会不时地与客户机代码和其他datanode通信。

datanode存储一系列命名块。

DataNode允许客户端代码读取这些块，或者写入新的块数据。

DataNode也可以响应来自它的NameNode的指令，删除块或从其他DataNode复制块。

DataNode只维护一个关键表:block->这个信息存储在本地磁盘上。

DataNode会在启动时以及之后的每隔一段时间向NameNode报告表的内容。

datanode一辈子都在无止境地要求NameNode做点什么。

NameNode不能直接连接到DataNode;NameNode只是从DataNode调用的函数中返回值。

datanode维护一个开放的服务器套接字，以便客户端代码或其他datanode可以读写数据。

这个服务器的主机/端口报告给NameNode，然后NameNode将该信息发送给可能感兴趣的客户端或其他datanode。

静态代码块

dataNode的静态代码块与NameNode中相同，用于加载默认的配置文件

  static{

    HdfsConfiguration.init();

  }

mian方法

由上文中SecureDataNodeStarter#start方法可以看到，默认调用的是DataNode#secureMain方法来启动datanode。而默认的main方法也是调用DataNode#secureMain，接下来具体看看main和secureMain方法的代码：

  public static void main(String args[]) {

    // 分析传入的参数，是否是帮助参数

    if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {

      System.exit(0);

    }

		// 调用

    secureMain(args, null);

  }

  public static void secureMain(String args[], SecureResources resources) {

    int errorCode = 0;

    try {

      //打印一些启动日志信息

      StringUtils.startupShutdownMessage(DataNode.class, args, LOG);

      // 创建datanode

      DataNode datanode = createDataNode(args, null, resources);

      if (datanode != null) {

        // join各种线程，等待执行结束

        // blockPoolManager.joinAll(); -> BPOfferService#jion -> BPServiceActor#join

        // BPServiceActor: 每个活动或备用namenode要执行的线程:

        // 预注册与namenode握手, 然后登记, 定期发送心跳到namenode, 处理从namenode接收到的命令

        datanode.join();

      } else {

        errorCode = 1;

      }

    } catch (Throwable e) {

      LOG.error("Exception in secureMain", e);

      terminate(1, e);

    } finally {

      // We need to terminate the process here because either shutdown was called

      // or some disk related conditions like volumes tolerated or volumes required

      // condition was not met. Also, In secure mode, control will go to Jsvc

      // and Datanode process hangs if it does not exit.

      LOG.warn("Exiting Datanode");

      terminate(errorCode);

    }

  }

DataNode#createDataNode

实例化&启动一个datanode守护进程并等待它完成。

  @VisibleForTesting

  @InterfaceAudience.Private

  public static DataNode createDataNode(String args[], Configuration conf,

      SecureResources resources) throws IOException {

    // 初始化datanode

    DataNode dn = instantiateDataNode(args, conf, resources);

    if (dn != null) {

      // 启动datanode进程

      dn.runDatanodeDaemon();

    }

    return dn;

  }

先来看看初始化datanode的流程：

DataNode#instantiateDataNode

// 实例化单个datanode对象及其安全资源。这必须通过随后调用datanodedaemon()来运行。

public static DataNode instantiateDataNode(String args [], Configuration conf,

    SecureResources resources) throws IOException {

  if (conf == null)

    conf = new HdfsConfiguration();

  if (args != null) {

    // 解析通用hadoop选项

    GenericOptionsParser hParser = new GenericOptionsParser(conf, args);

    args = hParser.getRemainingArgs();

  }

  // 解析和验证命令行参数并设置配置参数。

  if (!parseArguments(args, conf)) {

    printUsage(System.err);

    return null;

  }

  // 根据配置dfs.datanode.data.dir 获取实际的存储路径集合

  // StorageLocation: 封装描述存储目录的URI和存储介质。如果没有指定存储介质，则假定默认存储介质为DISK。

  // 详细的关于获取存储目录的解析看这篇博文： https://blog.csdn.net/Androidlushangderen/article/details/51105876

  Collection<StorageLocation> dataLocations = getStorageLocations(conf);

  // UserGroupInformation: Hadoop的用户和组信息。

  // 该类封装了一个JAAS Subject，并提供了确定用户用户名和组的方法。

  // 它同时支持Windows、Unix和Kerberos登录模块。

  // UserGroupInformation#setConfiguration: 设置UGI的静态配置。特别是设置安全身份验证机制和组查找服务。

  UserGroupInformation.setConfiguration(conf);

  // 作为config中指定的主体登录。将用户的Kerberos主体名中的$host替换为主机名。 如果是非安全模式-返回。

  SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,

      DFS_DATANODE_KERBEROS_PRINCIPAL_KEY, getHostName(conf));

  // 创建DataNode实例

  return makeInstance(dataLocations, conf, resources);

}

DataNode#makeInstance

// 在确保可以创建至少一个给定的数据目录(以及它们的父目录，如果需要的话)之后，创建DataNode实例。

static DataNode makeInstance(Collection<StorageLocation> dataDirs,

    Configuration conf, SecureResources resources) throws IOException {

  List<StorageLocation> locations;

  //  StorageLocationChecker: 在DataNode启动期间封装存储位置检查的实用程序类。其中一些代码是从DataNode类中提取的。

  StorageLocationChecker storageLocationChecker =

      new StorageLocationChecker(conf, new Timer());

  try {

    // 启动对提供的存储卷的检查，并返回运行正常的卷列表。

    // 为了与现有单元测试兼容，storagellocations将按照与输入相同的顺序返回。

    locations = storageLocationChecker.check(conf, dataDirs);

  } catch (InterruptedException ie) {

    throw new IOException("Failed to instantiate DataNode", ie);

  }

  // 初始化度量系统

  DefaultMetricsSystem.initialize("DataNode");

  // 检查数据目录的权限

  assert locations.size() > 0 : "number of data directories should be > 0";

  // 创建DataNode

  return new DataNode(conf, locations, storageLocationChecker, resources);

}

StorageLocationChecker#check

来具体看一下都做了哪些检查：

  // 启动对提供的存储卷的检查，并返回运行正常的卷列表。

  // 为了与现有单元测试兼容，storagellocations将按照与输入相同的顺序返回。

  // 返回运行正常的卷列表。如果没有正常运行的卷，则返回一个空列表。

  public List<StorageLocation> check(

      final Configuration conf,

      final Collection<StorageLocation> dataDirs)

      throws InterruptedException, IOException {

    final HashMap<StorageLocation, Boolean> goodLocations =

        new LinkedHashMap<>();

    final Set<StorageLocation> failedLocations = new HashSet<>();

    final Map<StorageLocation, ListenableFuture<VolumeCheckResult>> futures =

        Maps.newHashMap();

    // 获取本地文件系统。如果没有就创建一个新的

    final LocalFileSystem localFS = FileSystem.getLocal(conf);

    final CheckContext context = new CheckContext(localFS, expectedPermission);

    // 在所有storagelocation上启动并行磁盘检查操作。

    for (StorageLocation location : dataDirs) {

      goodLocations.put(location, true);

      // 对给定的Checkable安排异步检查。如果检查计划成功，则返回ListenableFuture。

      Optional<ListenableFuture<VolumeCheckResult>> olf =

          delegateChecker.schedule(location, context);

      if (olf.isPresent()) {

        futures.put(location, olf.get());

      }

    }

    if (maxVolumeFailuresTolerated >= dataDirs.size()) {

      throw new HadoopIllegalArgumentException("Invalid value configured for "

          + DFS_DATANODE_FAILED_VOLUMES_TOLERATED_KEY + " - "

          + maxVolumeFailuresTolerated + ". Value configured is >= "

          + "to the number of configured volumes (" + dataDirs.size() + ").");

    }

    final long checkStartTimeMs = timer.monotonicNow();

    // Retrieve the results of the disk checks.

    // 检索磁盘，检查磁盘状态是否健康

    for (Map.Entry<StorageLocation,

             ListenableFuture<VolumeCheckResult>> entry : futures.entrySet()) {

      // Determine how much time we can allow for this check to complete.

      // The cumulative wait time cannot exceed maxAllowedTimeForCheck.

      final long waitSoFarMs = (timer.monotonicNow() - checkStartTimeMs);

      final long timeLeftMs = Math.max(0,

          maxAllowedTimeForCheckMs - waitSoFarMs);

      final StorageLocation location = entry.getKey();

      try {

        final VolumeCheckResult result =

            entry.getValue().get(timeLeftMs, TimeUnit.MILLISECONDS);

        switch (result) {

        case HEALTHY:

          break;

        case DEGRADED:

          LOG.warn("StorageLocation {} appears to be degraded.", location);

          break;

        case FAILED:

          LOG.warn("StorageLocation {} detected as failed.", location);

          failedLocations.add(location);

          goodLocations.remove(location);

          break;

        default:

          LOG.error("Unexpected health check result {} for StorageLocation {}",

              result, location);

        }

      } catch (ExecutionException|TimeoutException e) {

        LOG.warn("Exception checking StorageLocation " + location,

            e.getCause());

        failedLocations.add(location);

        goodLocations.remove(location);

      }

    }

    if (maxVolumeFailuresTolerated == DataNode.MAX_VOLUME_FAILURE_TOLERATED_LIMIT) {

      if (dataDirs.size() == failedLocations.size()) {

        throw new DiskErrorException("Too many failed volumes - "

            + "current valid volumes: " + goodLocations.size()

            + ", volumes configured: " + dataDirs.size()

            + ", volumes failed: " + failedLocations.size()

            + ", volume failures tolerated: " + maxVolumeFailuresTolerated);

      }

    } else {

      if (failedLocations.size() > maxVolumeFailuresTolerated) {

        throw new DiskErrorException("Too many failed volumes - "

            + "current valid volumes: " + goodLocations.size()

            + ", volumes configured: " + dataDirs.size()

            + ", volumes failed: " + failedLocations.size()

            + ", volume failures tolerated: " + maxVolumeFailuresTolerated);

      }

    }

    if (goodLocations.size() == 0) {

      throw new DiskErrorException("All directories in "

          + DFS_DATANODE_DATA_DIR_KEY + " are invalid: "

          + failedLocations);

    }

    return new ArrayList<>(goodLocations.keySet());

  }

DataNode构造方法

// 给定一个配置、一个datadir数组和一个namenode代理，创建DataNode。

  DataNode(final Configuration conf,

           final List<StorageLocation> dataDirs,

           final StorageLocationChecker storageLocationChecker,

           final SecureResources resources) throws IOException {

    // 将配置文件赋值到父类的静态变量中

    super(conf);

    // 初始化Tracer，与NameNode中此处相比，仅传入参数有区别

    this.tracer = createTracer(conf);

    // TracerConfigurationManager类提供了通过RPC协议在运行时管理跟踪器配置的函数。

    this.tracerConfigurationManager =

        new TracerConfigurationManager(DATANODE_HTRACE_PREFIX, conf);

    // FileIoProvider这个类抽象出DataNode执行的各种文件IO操作，

    // 并在每个文件IO之前和之后调用概要分析(用于收集统计数据)和故障注入(用于测试)事件钩子。

    // 通过DFSConfigKeys启用剖析和/或错误注入事件钩子，可以将行为注入到这些事件中。

    this.fileIoProvider = new FileIoProvider(conf, this);

    // 初始化卷扫描，BlockScanner负责管理所有的VolumeScanner

    this.blockScanner = new BlockScanner(this);

    // 初始化各种配置参数

    this.lastDiskErrorCheck = 0;

    this.maxNumberOfBlocksToLog = conf.getLong(DFS_MAX_NUM_BLOCKS_TO_LOG_KEY,

        DFS_MAX_NUM_BLOCKS_TO_LOG_DEFAULT);

    this.usersWithLocalPathAccess = Arrays.asList(

        conf.getTrimmedStrings(DFSConfigKeys.DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY));

    this.connectToDnViaHostname = conf.getBoolean(

        DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME,

        DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT);

    this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY,

        DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT);

    this.isPermissionEnabled = conf.getBoolean(

        DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY,

        DFSConfigKeys.DFS_PERMISSIONS_ENABLED_DEFAULT);

    this.pipelineSupportECN = conf.getBoolean(

        DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED,

        DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED_DEFAULT);

    confVersion = "core-" +

        conf.get("hadoop.common.configuration.version", "UNSPECIFIED") +

        ",hdfs-" +

        conf.get("hadoop.hdfs.configuration.version", "UNSPECIFIED");

    // DatasetVolumeChecker： 对FsDatasetSpi的每个卷封装运行磁盘检查的类，并允许检索失败卷的列表。

    // 这分离了最初跨DataNode、FsDatasetImpl和FsVolumeList实现的行为。

    this.volumeChecker = new DatasetVolumeChecker(conf, new Timer());

    // 创建了个ExecutorService，用于执行dataTransfer任务

    // HadoopExecutors：ExecutorService、ScheduledExecutorService实例的工厂方法。这些执行器服务实例提供了额外的功能(例如记录未捕获的异常)。

    // DataTransfer：是DataNode的内部类，用于传输一个数据块。这个类将一条数据发送到另一个DataNode。

    this.xferService =

        HadoopExecutors.newCachedThreadPool(new Daemon.DaemonFactory());

    // Determine whether we should try to pass file descriptors to clients.

    // 确定是否应该尝试将文件描述符传递给客户端。

    if (conf.getBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY,

              HdfsClientConfigKeys.Read.ShortCircuit.DEFAULT)) {

      String reason = DomainSocket.getLoadingFailureReason();

      if (reason != null) {

        LOG.warn("File descriptor passing is disabled because {}", reason);

        this.fileDescriptorPassingDisabledReason = reason;

      } else {

        LOG.info("File descriptor passing is enabled.");

        this.fileDescriptorPassingDisabledReason = null;

      }

    } else {

      this.fileDescriptorPassingDisabledReason =

          "File descriptor passing was not configured.";

      LOG.debug(this.fileDescriptorPassingDisabledReason);

    }

    // 获取socket工厂，配置项为：hadoop.rpc.socket.factory.class.default，

    // 默认为：org.apache.hadoop.net.StandardSocketFactory

    this.socketFactory = NetUtils.getDefaultSocketFactory(conf);

    try {

      // 获取本datanode的主机名

      hostName = getHostName(conf);

      LOG.info("Configured hostname is {}", hostName);

      // 启动datanode

      startDataNode(dataDirs, resources);

    } catch (IOException ie) {

      shutdown();

      throw ie;

    }

    final int dncCacheMaxSize =

        conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,

            DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;

    datanodeNetworkCounts =

        CacheBuilder.newBuilder()

            .maximumSize(dncCacheMaxSize)

            .build(new CacheLoader<String, Map<String, Long>>() {

              @Override

              public Map<String, Long> load(String key) throws Exception {

                final Map<String, Long> ret = new HashMap<String, Long>();

                ret.put("networkErrors", 0L);

                return ret;

              }

            });

    initOOBTimeout();

    this.storageLocationChecker = storageLocationChecker;

  }

DataNode#startDataNode

// 此方法使用指定的conf启动数据节点，如果设置了conf的config_property_simulation属性，则创建一个模拟的基于存储的数据节点

void startDataNode(List<StorageLocation> dataDirectories,

                   SecureResources resources

                   ) throws IOException {

  // settings global for all BPs in the Data Node

  this.secureResources = resources;

  synchronized (this) {

    this.dataDirs = dataDirectories;

  }

  // DNConf: 一个简单的类，封装了DataNode在启动时加载的所有配置。

  this.dnConf = new DNConf(this);

  // 检查secure模式下的配置

  checkSecureConfig(dnConf, getConf(), resources);

  // 检查DataNode给缓存使用的最大内存量是否在正常范围

  if (dnConf.maxLockedMemory > 0) {

    if (!NativeIO.POSIX.getCacheManipulator().verifyCanMlock()) {

      throw new RuntimeException(String.format(

          "Cannot start datanode because the configured max locked memory" +

          " size (%s) is greater than zero and native code is not available.",

          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY));

    }

    if (Path.WINDOWS) {

      NativeIO.Windows.extendWorkingSetSize(dnConf.maxLockedMemory);

    } else {

      long ulimit = NativeIO.POSIX.getCacheManipulator().getMemlockLimit();

      if (dnConf.maxLockedMemory > ulimit) {

        throw new RuntimeException(String.format(

          "Cannot start datanode because the configured max locked memory" +

          " size (%s) of %d bytes is more than the datanode's available" +

          " RLIMIT_MEMLOCK ulimit of %d bytes.",

          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,

          dnConf.maxLockedMemory,

          ulimit));

      }

    }

  }

  LOG.info("Starting DataNode with maxLockedMemory = {}",

      dnConf.maxLockedMemory);

  int volFailuresTolerated = dnConf.getVolFailuresTolerated();

  int volsConfigured = dnConf.getVolsConfigured();

  if (volFailuresTolerated < MAX_VOLUME_FAILURE_TOLERATED_LIMIT

      || volFailuresTolerated >= volsConfigured) {

    throw new HadoopIllegalArgumentException("Invalid value configured for "

        + "dfs.datanode.failed.volumes.tolerated - " + volFailuresTolerated

        + ". Value configured is either less than -1 or >= "

        + "to the number of configured volumes (" + volsConfigured + ").");

  }

  // 初始化DataStorage：数据存储信息文件。

  // 本地存储信息存储在一个单独的文件VERSION中。

  // 包含节点类型、存储布局版本、命名空间id、fs状态创建时间。

  // 本地存储可以位于多个目录中。每个目录应该包含与其他目录相同的VERSION文件。

  // 在启动期间Hadoop服务器(name-node和data-node)从它们读取本地存储信息。

  // 服务器在运行时对每个存储目录持有一个锁，这样其他节点就不能在启动时共享相同的存储。

  // 当服务器停止(正常或异常)时，锁将被释放。

  storage = new DataStorage();

  // global DN settings

  // 注册JMX，JMX介绍看着篇： https://www.liaoxuefeng.com/wiki/1252599548343744/1282385687609378

  registerMXBean();

  // 初始化DataXceiver（流式通信），DataNode runDatanodeDaemon()中启动

  initDataXceiver();

  // 启动InfoServer

  startInfoServer();

  // 启动JVMPauseMonitor（反向监控JVM情况，可通过JMX查询）

  pauseMonitor = new JvmPauseMonitor();

  pauseMonitor.init(getConf());

  pauseMonitor.start();

  // BlockPoolTokenSecretManager is required to create ipc server.

  //  BlockPoolTokenSecretManager: 管理每个块池的BlockTokenSecretManager。将给定块池Id的请求路由到相应的BlockTokenSecretManager

  this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();

  // Login is done by now. Set the DN user name.

  dnUserName = UserGroupInformation.getCurrentUser().getUserName();

  LOG.info("dnUserName = {}", dnUserName);

  LOG.info("supergroup = {}", supergroup);

  // 初始化IpcServer（RPC通信），DataNode-runDatanodeDaemon()中启动

  initIpcServer();

  metrics = DataNodeMetrics.create(getConf(), getDisplayName());

  peerMetrics = dnConf.peerStatsEnabled ?

      DataNodePeerMetrics.create(getDisplayName(), getConf()) : null;

  metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);

  ecWorker = new ErasureCodingWorker(getConf(), this);

  blockRecoveryWorker = new BlockRecoveryWorker(this);

  // 按照namespace（nameservice）、namenode的结构进行初始化

  blockPoolManager = new BlockPoolManager(this);

  // 心跳管理

  blockPoolManager.refreshNamenodes(getConf());

  // Create the ReadaheadPool from the DataNode context so we can

  // exit without having to explicitly shutdown its thread pool.

  readaheadPool = ReadaheadPool.getInstance();

  saslClient = new SaslDataTransferClient(dnConf.getConf(),

      dnConf.saslPropsResolver, dnConf.trustedChannelResolver);

  saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);

  startMetricsLogger();

  if (dnConf.diskStatsEnabled) {

    diskMetrics = new DataNodeDiskMetrics(this,

        dnConf.outliersReportIntervalMs);

  }

}

DataNode#checkSecureConfig

先看看checkSecureConfig(dnConf, getConf(), resources);方法具体检测了什么，又如何使用了传入的resource参数：

// 如果启用了安全性，检查DataNode是否有安全配置。有两种可能的配置是安全的:

// 1. 服务器已经通过SecureDataNodeStarter绑定到RPC和HTTP的特权端口。

// 2. 该配置对HTTP服务器的DataTransferProtocol和HTTPS(无明文HTTP)启用SASL。

// SASL握手保证了RPC服务器在客户端传输一个秘密(比如块访问令牌)之前的身份验证。

// 类似地，SSL在客户端传输秘密(比如委托令牌)之前保证HTTP服务器的身份验证。

// 不可能同时在DataTransferProtocol上运行特权端口和SASL。

// 为了向后兼容，连接逻辑必须检查目标端口是否为特权端口，如果是，跳过SASL握手。

private static void checkSecureConfig(DNConf dnConf, Configuration conf,

    SecureResources resources) throws RuntimeException {

  if (!UserGroupInformation.isSecurityEnabled()) {

    return;

  }

  // Abort out of inconsistent state if Kerberos is enabled but block access tokens are not enabled.

  // 如果启用了Kerberos，但没有启用块访问令牌，则退出不一致状态

  boolean isEnabled = conf.getBoolean(

      DFSConfigKeys.DFS_BLOCK_ACCESS_TOKEN_ENABLE_KEY,

      DFSConfigKeys.DFS_BLOCK_ACCESS_TOKEN_ENABLE_DEFAULT);

  if (!isEnabled) {

    String errMessage = "Security is enabled but block access tokens " +

        "(via " + DFSConfigKeys.DFS_BLOCK_ACCESS_TOKEN_ENABLE_KEY + ") " +

        "aren't enabled. This may cause issues " +

        "when clients attempt to connect to a DataNode. Aborting DataNode";

    throw new RuntimeException(errMessage);

  }

  // 如果配置设置为跳过安全集群中正确端口配置的检查，则返回true。这只用于开发测试。

  if (dnConf.getIgnoreSecurePortsForTesting()) {

    return;

  }

  if (resources != null) {

    // 特权端口或配置HTTPS_ONLY

    final boolean httpSecured = resources.isHttpPortPrivileged()

        || DFSUtil.getHttpPolicy(conf) == HttpConfig.Policy.HTTPS_ONLY;

    // 特权端口或配置开启sasl

    final boolean rpcSecured = resources.isRpcPortPrivileged()

        || resources.isSaslEnabled();

    // Allow secure DataNode to startup if:

    // 1. Http is secure.

    // 2. Rpc is secure

    if (rpcSecured && httpSecured) {

      return;

    }

  } else {

    // Handle cases when SecureDataNodeStarter#getSecureResources is not invoked

    // 处理SecureDataNodeStarter#getSecureResources未被调用的情况

    SaslPropertiesResolver saslPropsResolver = dnConf.getSaslPropsResolver();

    if (saslPropsResolver != null &&

        DFSUtil.getHttpPolicy(conf) == HttpConfig.Policy.HTTPS_ONLY) {

      return;

    }

  }

  throw new RuntimeException("Cannot start secure DataNode due to incorrect "

      + "config. See https://cwiki.apache.org/confluence/display/HADOOP/"

      + "Secure+DataNode for details.");

}

DataNode#initDataXceiver

private void initDataXceiver() throws IOException {

    // find free port or use privileged port provided

    TcpPeerServer tcpPeerServer;

    if (secureResources != null) {

      // 通过secureResources中的streamingSocket创建TcpPeerServer

      tcpPeerServer = new TcpPeerServer(secureResources);

    } else {

      int backlogLength = getConf().getInt(

          CommonConfigurationKeysPublic.IPC_SERVER_LISTEN_QUEUE_SIZE_KEY,

          CommonConfigurationKeysPublic.IPC_SERVER_LISTEN_QUEUE_SIZE_DEFAULT);

      tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,

          DataNode.getStreamingAddr(getConf()), backlogLength);

    }

    if (dnConf.getTransferSocketRecvBufferSize() > 0) {

      tcpPeerServer.setReceiveBufferSize(

          dnConf.getTransferSocketRecvBufferSize());

    }

    streamingAddr = tcpPeerServer.getStreamingAddr();

    LOG.info("Opened streaming server at {}", streamingAddr);

    // 构造一个新的线程组。这个新组的父线程组是当前运行线程的线程组。

    this.threadGroup = new ThreadGroup("dataXceiverServer");

    // DataXceiverServer: 用于接收/发送数据块的服务器。

    // 创建它是为了侦听来自客户端或其他datanode的请求。这个小服务器不使用Hadoop IPC机制。

    xserver = new DataXceiverServer(tcpPeerServer, getConf(), this);

    // DN用来接收客户端和其他DN发送过来的数据服务,并为每个请求创建一个工作线程以进行请求的响应

    this.dataXceiverServer = new Daemon(threadGroup, xserver);

    this.threadGroup.setDaemon(true); // auto destroy when empty

    if (getConf().getBoolean(

        HdfsClientConfigKeys.Read.ShortCircuit.KEY,

        HdfsClientConfigKeys.Read.ShortCircuit.DEFAULT) ||

        getConf().getBoolean(

            HdfsClientConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC,

            HdfsClientConfigKeys

              .DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC_DEFAULT)) {

      DomainPeerServer domainPeerServer =

                getDomainPeerServer(getConf(), streamingAddr.getPort());

      if (domainPeerServer != null) {

        this.localDataXceiverServer = new Daemon(threadGroup,

            new DataXceiverServer(domainPeerServer, getConf(), this));

        LOG.info("Listening on UNIX domain socket: {}",

            domainPeerServer.getBindPath());

      }

    }

    this.shortCircuitRegistry = new ShortCircuitRegistry(getConf());

  }

DataNode#createDataNode

接着回到DataNode#createDataNode方法中，继续看启动datanode的流程dn.runDatanodeDaemon();：

  public void runDatanodeDaemon() throws IOException {

    blockPoolManager.startAll();

    // start dataXceiveServer

    dataXceiverServer.start();

    if (localDataXceiverServer != null) {

      localDataXceiverServer.start();

    }

    ipcServer.setTracer(tracer);

    ipcServer.start();

    startPlugins(getConf());

  }

hadoop源码_hdfs启动流程_2_DataNode的更多相关文章

hadoop源码_hdfs启动流程_3_心跳机制
hadoop在启动namenode和datanode之后,两者之间是如何联动了?datanode如何向namenode注册?如何汇报数据?namenode又如何向datanode发送命令? 心跳机制基 ...
渣渣菜鸡的 ElasticSearch 源码解析 —— 启动流程（下）
关注我转载请务必注明原创地址为:http://www.54tianzhisheng.cn/2018/08/12/es-code03/ 前提上篇文章写完了 ES 流程启动的一部分,main 方法都入 ...
渣渣菜鸡的 ElasticSearch 源码解析 —— 启动流程（上）
关注我转载请务必注明原创地址为:http://www.54tianzhisheng.cn/2018/08/11/es-code02/ 前提上篇文章写了 ElasticSearch 源码解析 -- ...
apiserver源码分析——启动流程
前言 apiserver是k8s控制面的一个组件,在众多组件中唯一一个对接etcd,对外暴露http服务的形式为k8s中各种资源提供增删改查等服务.它是RESTful风格,每个资源的URI都会形如 / ...
Android4.0源码Launcher启动流程分析【android源码Launcher系列一】
最近研究ICS4.0的Launcher,发现4.0和2.3有稍微点区别,但是区别不是特别大,所以我就先整理一下Launcher启动的大致流程. Launcher其实是贯彻于手机的整个系统的,时时刻刻都 ...
Hadoop源码学习笔记之NameNode启动场景流程一：源码环境搭建和项目模块及NameNode结构简单介绍
最近在跟着一个大佬学习Hadoop底层源码及架构等知识点,觉得有必要记录下来这个学习过程.想到了这个废弃已久的blog账号,决定重新开始更新. 主要分以下几步来进行源码学习: 一.搭建源码阅读环境二. ...
Hadoop源码编译过程
一. 为什么要编译Hadoop源码 Hadoop是使用Java语言开发的,但是有一些需求和操作并不适合使用java,所以就引入了本地库(Native Libraries)的概念,通 ...
[Hadoop源码解读]（六）MapReduce篇之MapTask类
MapTask类继承于Task类,它最主要的方法就是run(),用来执行这个Map任务. run()首先设置一个TaskReporter并启动,然后调用JobConf的getUseNewAPI()判断 ...
Symfony2源码分析——启动过程2
文章地址:http://www.hcoding.com/?p=46 上一篇分析Symfony2框架源码,探究Symfony2如何完成一个请求的前半部分,前半部分可以理解为Symfony2框架为处理请求 ...

随机推荐

Step By Step(Lua-C API简介)
Step By Step(Lua-C API简介) Lua是一种嵌入式脚本语言,即Lua不是可以单独运行的程序,在实际应用中,主要存在两种应用形式.第一种形式是,C/C++作为主程序,调用Lua代码, ...
node.js学习(2)函数
1 简答函数 2 匿名函数 3 回调函数
Java中Map<Key, Value>存储结构根据值排序(sort by values)
需求:Map<key, value>中可以根据key, value 进行排序,由于 key 都是唯一的,可以很方便的进行比较操作,但是每个key 对应的value不是唯一的,有可能出现多个 ...
Django（48）drf请求模块源码分析
前言 APIView中的dispatch是整个请求生命过程的核心方法,包含了请求模块,权限验证,异常模块和响应模块,我们先来介绍请求模块请求模块:request对象源码入口 APIView类中di ...
APA自动泊车系统
APA自动泊车系统 1. 半自动泊车自动泊车又称为自动泊车入位,它对于新手来说是一项相当便捷的配置,对于老手来说也省了些不少力气.那么自动泊车的原理是什么呢?能想怎么停就怎么停,想停哪儿就停哪儿吗? ...
TensorFlow实现多层感知机MINIST分类
TensorFlow实现多层感知机MINIST分类 TensorFlow 支持自动求导,可以使用 TensorFlow 优化器来计算和使用梯度.使用梯度自动更新用变量定义的张量.本文将使用 Tenso ...
实时双频Wi-Fi如何实现下一代车内连接
实时双频Wi-Fi如何实现下一代车内连接 How real simultaneous dual band Wi-Fi enables next-generation in-vehicle connec ...
Seata分布式事务框架Sample
前言阿里官方给出了seata-sample地址,官方自己也对Sample提供了很多类型,可以查看学习. 我这里选择演示SpringBoot+MyBatis. 该聚合工程共包括5个module: sb ...
移动通信-5G
1.移动通信的发展历程: "G"代表一代,每10年一个周期 1G 2G 3G 4G 5G 1980s 1990s 2000s 2010s 2020s 语音短信社交应用在线.互 ...
NEXTCLOUD 常见错误
HTTP请求头"Strict-Transport-Security"没有配置为至少"15552000"秒出于增强安全性考虑推荐按照安全提示中的说明启用HSTS ...

hadoop源码_hdfs启动流程_2_DataNode

DataNode 启动流程

脚本代码分析

SecureDataNodeStarter

静态变量

getSecureResources(conf)

DataNode

静态代码块

mian方法

DataNode#createDataNode

DataNode#instantiateDataNode

DataNode#makeInstance

StorageLocationChecker#check

DataNode构造方法

DataNode#startDataNode

DataNode#checkSecureConfig

DataNode#initDataXceiver

DataNode#createDataNode

hadoop源码_hdfs启动流程_2_DataNode的更多相关文章

随机推荐

热门专题