由脚本找到 RM 主类

  这部分,我们从脚本作为入口去逐步深入ResourceManager源码。

  从 Hadoop 官方文档 中可以看到 ResourceManager 的启动命令为:

  Usage: yarn resourcemanager [-format-state-store]

COMMAND_OPTIONS Description
-format-state-store Formats the RMStateStore. This will clear the RMStateStore and is useful if past applications are no longer needed. This should be run only when the ResourceManager is not running.
-remove-application-from-state-store <appId> Remove the application from RMStateStore. This should be run only when the ResourceManager is not running.

定位到 源代码 hadoop-yarn-project > hadoop-yarn > bin > start-yarn.sh

  1. # start resourceManager
  2. HARM=$("${HADOOP_HDFS_HOME}/bin/hdfs" getconf -confKey yarn.resourcemanager.ha.enabled >&-) # 查看配置,是否启用 ResourceManager 的 HA 机制
    # 未启用 ResourceManager 的 HA 机制
  3. if [[ ${HARM} = "false" ]]; then
  4. echo "Starting resourcemanager"
  5. hadoop_uservar_su yarn resourcemanager "${HADOOP_YARN_HOME}/bin/yarn" \
  6. --config "${HADOOP_CONF_DIR}" \
  7. --daemon start \
  8. resourcemanager
  9. (( HADOOP_JUMBO_RETCOUNTER=HADOOP_JUMBO_RETCOUNTER + $? ))
  10. else # 启用ResourceManager的 HA 机制
  11. logicals=$("${HADOOP_HDFS_HOME}/bin/hdfs" getconf -confKey yarn.resourcemanager.ha.rm-ids >&-) # yarn.resoucemanager.ha.rm-ids 表示 RM 的逻辑Ids,多个按逗号分割
  12. logicals=${logicals//,/ } # 按逗号分割成多个 RM id
  13. for id in ${logicals}
  14. do
  15. rmhost=$("${HADOOP_HDFS_HOME}/bin/hdfs" getconf -confKey "yarn.resourcemanager.hostname.${id}" >&-)
  16. RMHOSTS="${RMHOSTS} ${rmhost}" # 最终,RMHOSTS 变量会是由空格分割的 hostname 字符串
  17. done
  18. echo "Starting resourcemanagers on [${RMHOSTS}]"
  19. hadoop_uservar_su yarn resourcemanager "${HADOOP_YARN_HOME}/bin/yarn" \ # 运行 yarn 命令
  20. --config "${HADOOP_CONF_DIR}" \
  21. --daemon start \
  22. --workers \
  23. --hostnames "${RMHOSTS}" \
  24. resourcemanager
  25. (( HADOOP_JUMBO_RETCOUNTER=HADOOP_JUMBO_RETCOUNTER + $? )) # 累加上一个命令的返回值
  26. fi

首先解释 shell 分割字符串的语法:

  1. $ aa='1,2,3';for i in ${aa//,/ }; do echo $i; done;
  2. 1
  3. 2
  4. 3

参照 官方的配置sample 会比较容易理解,下面已经启用了HA,并且 RM ids 有 rm1,rm2, 其中rm1 的hostname 是 master1, rm2 的 hostname 是 master2,:

  1. <property>
  2. <name>yarn.resourcemanager.ha.enabled</name>
  3. <value>true</value>
  4. </property>
  5. <property>
  6. <name>yarn.resourcemanager.cluster-id</name>
  7. <value>cluster1</value>
  8. </property>
  9. <property>
  10. <name>yarn.resourcemanager.ha.rm-ids</name>
  11. <value>rm1,rm2</value>
  12. </property>
  13. <property>
  14. <name>yarn.resourcemanager.hostname.rm1</name>
  15. <value>master1</value>
  16. </property>
  17. <property>
  18. <name>yarn.resourcemanager.hostname.rm2</name>
  19. <value>master2</value>
  20. </property>
  21. <property>
  22. <name>yarn.resourcemanager.zk-address</name>
  23. <value>zk1:2181,zk2:2181,zk3:2181</value>
  24. </property>

然后再结合 yarn 脚本,可以得出,resourcemanager 的 入口类是 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager,参数为 --config "${HADOOP_CONF_DIR}" --daemon start --workers --hostnames "${RMHOSTS}" 以及经由 shell函数 传递的参数值(不做具体分析)

分析 RM 服务初始化过程

分析ResouceManager 类继承关系

接下来,终于到了入口类 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager, 该类在 hadoop-yarn-server-resourcemanager 的子 mudule 下。

先来看 RM 对象的 声明, 继承了 CompositeService 服务类,说明 RM 是一个组件服务,实现了ResourceManagerMXBean接口,可以交给 JMX 管理:

  1. public class ResourceManager extends CompositeService
  2. implements Recoverable, ResourceManagerMXBean

分析 ResourceManager 的入口函数

然后,找到 Main 函数:

  1. public static void main(String argv[]) {
  2. Thread.setDefaultUncaughtExceptionHandler(new YarnUncaughtExceptionHandler());
  3. StringUtils.startupShutdownMessage(ResourceManager.class, argv, LOG);
  4. try {
  5. Configuration conf = new YarnConfiguration();
  6. GenericOptionsParser hParser = new GenericOptionsParser(conf, argv); # 解析参数
  7. argv = hParser.getRemainingArgs(); # --参数名 参数值之外的剩余以"-"开头的参数,第一次,没有指定剩余参数
  8. // If -format-state-store, then delete RMStateStore; else startup normally
  9. if (argv.length >= 1) {
  10. if (argv[0].equals("-format-state-store")) {
  11. deleteRMStateStore(conf);
  12. } else if (argv[0].equals("-remove-application-from-state-store")
  13. && argv.length == 2) {
  14. removeApplication(conf, argv[1]);
  15. } else {
  16. printUsage(System.err);
  17. }
  18. } else {
  19. ResourceManager resourceManager = new ResourceManager();
        // 初始化RM对象实例,在超类中初始化服务名称为 “ResouceManager” ,并实例化了状态模型成员字段 stateModel,初始化状态为 Service.State.NOTINITED ,后面详细介绍
  20. ShutdownHookManager.get().addShutdownHook( // 添加服务组件关闭的回调函数
  21. new CompositeServiceShutdownHook(resourceManager),
  22. SHUTDOWN_HOOK_PRIORITY);
  23. resourceManager.init(conf); // 初始化 RM 服务
  24. resourceManager.start(); // 启动 RM 服务
  25. }
  26. } catch (Throwable t) {
  27. LOG.fatal("Error starting ResourceManager", t);
  28. System.exit(-1);
  29. }
  30. }

分析 ResourceManager的 初始化过程

  1. @Override // 定义在其父类 AbstractService 中
  2. public void init(Configuration conf) {
  3. if (conf == null) {
  4. throw new ServiceStateException("Cannot initialize service "
  5. + getName() + ": null configuration");
  6. }
  7. if (isInState(STATE.INITED)) {
  8. return;
  9. }
  10. synchronized (stateChangeLock) {
  11. if (enterState(STATE.INITED) != STATE.INITED) { // 服务没有没有被初始化过
  12. setConfig(conf); // 设值 conf 对象
  13. try {
  14. serviceInit(config); // 初始化服务
  15. if (isInState(STATE.INITED)) { // 如果服务正确初始化
  16. //if the service ended up here during init,
  17. //notify the listeners
  18. notifyListeners(); // 通知 listener
  19. }
  20. } catch (Exception e) {
  21. noteFailure(e);
  22. ServiceOperations.stopQuietly(LOG, this);
  23. throw ServiceStateException.convert(e);
  24. }
  25. }
  26. }
  27. }

serviceInit 方法在 ResouceManager 类中有实现:

  1. @Override
  2. protected void serviceInit(Configuration conf) throws Exception {
  3. this.conf = conf;
  4. // 1. 初始化服务上下文
  5. // RMContextImpl 保存了两类服务的上下文
  6. // 一类是 serviceContext : 这类服务是 Always On 服务,即不考虑HA状态的一直运行的服务
  7. // 一类是 activeServiceCotext : 活动的服务上下文,即需要运行在Active RM 节点上的服务
  8. this.rmContext = new RMContextImpl();
  9. rmContext.setResourceManager(this);
  10.  
  11. // 2. 设置配置的provider
  12. this.configurationProvider =
  13. ConfigurationProviderFactory.getConfigurationProvider(conf);
  14. this.configurationProvider.init(this.conf);
  15. rmContext.setConfigurationProvider(configurationProvider);
  16.  
  17. // 3.加载 core-site.xml
  18. loadConfigurationXml(YarnConfiguration.CORE_SITE_CONFIGURATION_FILE);
  19.  
  20. // Do refreshSuperUserGroupsConfiguration with loaded core-site.xml
  21. // Or use RM specific configurations to overwrite the common ones first
  22. // if they exist
  23. RMServerUtils.processRMProxyUsersConf(conf);
  24. ProxyUsers.refreshSuperUserGroupsConfiguration(this.conf);
  25.  
  26. // 4. 加载 yarn-site.xml
  27. loadConfigurationXml(YarnConfiguration.YARN_SITE_CONFIGURATION_FILE);
  28. // 5. 配置校验
  29. validateConfigs(this.conf);
  30.  
  31. // 6. login
  32. // Set HA configuration should be done before login
  33. this.rmContext.setHAEnabled(HAUtil.isHAEnabled(this.conf));
  34. if (this.rmContext.isHAEnabled()) { // 如果RM 启用了 HA,设置 HA 的配置
  35. HAUtil.verifyAndSetConfiguration(this.conf);
  36. }
  37.  
  38. // Set UGI and do login
  39. // If security is enabled, use login user
  40. // If security is not enabled, use current user
  41. // 如果是启用了 安全认证,比如 kerberos,使用kerberos 登陆用户,否则默认使用当前用户
  42. this.rmLoginUGI = UserGroupInformation.getCurrentUser();
  43. try {
  44. doSecureLogin();
  45. } catch(IOException ie) {
  46. throw new YarnRuntimeException("Failed to login", ie);
  47. }
  48.  
  49. // register the handlers for all AlwaysOn services using setupDispatcher().
  50. // 7. 初始化所有的一直运行的服务的事件的handler
  51. rmDispatcher = setupDispatcher();
  52. addIfService(rmDispatcher);
  53. rmContext.setDispatcher(rmDispatcher);
  54.  
  55. // The order of services below should not be changed as services will be
  56. // started in same order
  57. // As elector service needs admin service to be initialized and started,
  58. // first we add admin service then elector service
  59. // 8. 创建 AdminService
  60. adminService = createAdminService();
  61. addService(adminService);
  62. rmContext.setRMAdminService(adminService);
  63.  
  64. // elector must be added post adminservice
  65. if (this.rmContext.isHAEnabled()) {
  66. // If the RM is configured to use an embedded leader elector,
  67. // initialize the leader elector.
  68. if (HAUtil.isAutomaticFailoverEnabled(conf)
  69. && HAUtil.isAutomaticFailoverEmbedded(conf)) {
  70. EmbeddedElector elector = createEmbeddedElector();
  71. addIfService(elector);
  72. rmContext.setLeaderElectorService(elector);
  73. }
  74. }
  75.  
  76. // 9. 设置 Yarn Configuration
  77. rmContext.setYarnConfiguration(conf);
  78. // 10. 创建并初始化 Active Service
  79. createAndInitActiveServices(false);
  80.  
  81. // 11. 获取 yarn wenApp地址
  82. webAppAddress = WebAppUtils.getWebAppBindURL(this.conf,
  83. YarnConfiguration.RM_BIND_HOST,
  84. WebAppUtils.getRMWebAppURLWithoutScheme(this.conf));
  85.  
  86. // 12. 创建 RMApplicationHistoryWriter 服务
  87. RMApplicationHistoryWriter rmApplicationHistoryWriter =
  88. createRMApplicationHistoryWriter();
  89. addService(rmApplicationHistoryWriter);
  90. rmContext.setRMApplicationHistoryWriter(rmApplicationHistoryWriter);
  91.  
  92. // initialize the RM timeline collector first so that the system metrics
  93. // publisher can bind to it
  94. // 13. 创建 RM timeline collector
  95. if (YarnConfiguration.timelineServiceV2Enabled(this.conf)) {
  96. RMTimelineCollectorManager timelineCollectorManager =
  97. createRMTimelineCollectorManager();
  98. addService(timelineCollectorManager);
  99. rmContext.setRMTimelineCollectorManager(timelineCollectorManager);
  100. }
  101.  
  102. // 14. 设置 SystemMetricsPublisher
  103. SystemMetricsPublisher systemMetricsPublisher =
  104. createSystemMetricsPublisher();
  105. addIfService(systemMetricsPublisher);
  106. rmContext.setSystemMetricsPublisher(systemMetricsPublisher);
  107.  
  108. // 15. 注册 JMX
  109. registerMXBean();
  110. // 16. 调用父类的服务 init 方法
  111. super.serviceInit(this.conf);
  112. }

下面逐一查看初始化的各个子步骤

初始化服务上下文

  1. public RMContextImpl() {
  2. // 一直运行的服务上下文
  3. this.serviceContext = new RMServiceContext();
  4. // 只运行在 active RM 节点上的 上下文
  5. this.activeServiceContext = new RMActiveServiceContext();
  6. }

设置配置的 provider

这里使用了工厂模式和配置提供了默认的ConfigurationProvider ,并且用户可以实现 ConfigurationProvider 自定义 provider。

provider 其实在其他的源码中也经常用到。在这里,provider 提供了可以做一些内部的初始化以及返回 配置文件的 inputstream 流对象,关闭流对象等操作。对于处理解析配置的类来说,只需要一个输入流即可。

  1. // ConfigurationProviderFactory 是一个工厂类
  2. /**
  3. * Creates an instance of {@link ConfigurationProvider} using given
  4. * configuration.
  5. * @param bootstrapConf
  6. * @return configurationProvider
  7. */
  8. @SuppressWarnings("unchecked")
  9. public static ConfigurationProvider
  10. getConfigurationProvider(Configuration bootstrapConf) {
  11. Class<? extends ConfigurationProvider> defaultProviderClass;
  12. try {
  13. // 默认的 provider class 是org.apache.hadoop.yarn.LocalConfigurationProvider
  14. defaultProviderClass = (Class<? extends ConfigurationProvider>)
  15. Class.forName(
  16. YarnConfiguration.DEFAULT_RM_CONFIGURATION_PROVIDER_CLASS);
  17. } catch (Exception e) {
  18. throw new YarnRuntimeException(
  19. "Invalid default configuration provider class"
  20. + YarnConfiguration.DEFAULT_RM_CONFIGURATION_PROVIDER_CLASS, e);
  21. }
  22. ConfigurationProvider configurationProvider =
  23. // 从缓存池中获取到该类的 构造方法,然后根据构造方法反射得到 provider实例
  24. // 可以 通过 yarn.resourcemanager.configuration.provider-class 参数指定 provider
  25. ReflectionUtils.newInstance(bootstrapConf.getClass(
  26. YarnConfiguration.RM_CONFIGURATION_PROVIDER_CLASS,
  27. defaultProviderClass, ConfigurationProvider.class),
  28. bootstrapConf);
  29. return configurationProvider;
  30. }

加载 core-site.xml 文件

  1. private void loadConfigurationXml(String configurationFile)
  2. throws YarnException, IOException {
  3. InputStream configurationInputStream =
  4. this.configurationProvider.getConfigurationInputStream(this.conf,
  5. configurationFile);
  6. if (configurationInputStream != null) {
  7. this.conf.addResource(configurationInputStream, configurationFile);
  8. }
  9. }

加载 yarn-site.xml

跟加载 core-site.xml 文件操作类似

校验配置文件

主要校验 最大尝试次数 和 过期会话时长 和 心跳间隔的关系

  1. protected static void validateConfigs(Configuration conf) {
  2. // validate max-attempts
  3. int globalMaxAppAttempts =
  4. conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
  5. YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
  6. if (globalMaxAppAttempts <= 0) {
  7. throw new YarnRuntimeException("Invalid global max attempts configuration"
  8. + ", " + YarnConfiguration.RM_AM_MAX_ATTEMPTS
  9. + "=" + globalMaxAppAttempts + ", it should be a positive integer.");
  10. }
  11.  
  12. // validate expireIntvl >= heartbeatIntvl
  13. long expireIntvl = conf.getLong(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
  14. YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
  15. long heartbeatIntvl =
  16. conf.getLong(YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS,
  17. YarnConfiguration.DEFAULT_RM_NM_HEARTBEAT_INTERVAL_MS);
  18. if (expireIntvl < heartbeatIntvl) {
  19. throw new YarnRuntimeException("Nodemanager expiry interval should be no"
  20. + " less than heartbeat interval, "
  21. + YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS + "=" + expireIntvl
  22. + ", " + YarnConfiguration.RM_NM_HEARTBEAT_INTERVAL_MS + "="
  23. + heartbeatIntvl);
  24. }
  25. }

用户登陆

第一步:校验是否启用了HA, 如果启用了HA,需要配置HA 的相关信息,因为 用户登陆,是每个节点都需要登陆的。

第二步:获取当前的用户, 如果启用了 kerberos,那么是当前登陆kerberos的用户,否则是当前用户

  1. @InterfaceAudience.Public
  2. @InterfaceStability.Evolving
  3. public static UserGroupInformation getCurrentUser() throws IOException {
  4. AccessControlContext context = AccessController.getContext();
  5. Subject subject = Subject.getSubject(context);
  6. if (subject == null || subject.getPrincipals(User.class).isEmpty()) {
  7. return getLoginUser();
  8. } else {
  9. return new UserGroupInformation(subject);
  10. }
  11. }

第三步: 调用安全API登陆,并获取登陆用户

  1. protected void doSecureLogin() throws IOException {
  2. InetSocketAddress socAddr = getBindAddress(conf);
  3. SecurityUtil.login(this.conf, YarnConfiguration.RM_KEYTAB,
  4. YarnConfiguration.RM_PRINCIPAL, socAddr.getHostName());
  5.  
  6. // if security is enable, set rmLoginUGI as UGI of loginUser
  7. if (UserGroupInformation.isSecurityEnabled()) {
  8. this.rmLoginUGI = UserGroupInformation.getLoginUser();
  9. }
  10. }

初始化所有一直运行的服务事件的handler

  1. private Dispatcher setupDispatcher() {
  2. // 创建 dispatcher
  3. Dispatcher dispatcher = createDispatcher();
  4. // 将 RMFatalEventType 事件的handler RMFatalEventDispatcher
  5. // 注册到 dispatcher
  6. dispatcher.register(RMFatalEventType.class,
  7. new ResourceManager.RMFatalEventDispatcher());
  8. return dispatcher;
  9. }
  10.  
  11. protected Dispatcher createDispatcher() {
  12. return new AsyncDispatcher("RM Event dispatcher");
  13. }

  AsyncDispatcher 内部是 有一个 阻塞的 事件队列,有一个一直运行的 执行线程,当阻塞队列中有事件被放入,执行线程会把事件取出来,并获取事件的类型,从事件注册器Map<Class<? extends Enum>, EventHandler>中 获取到对应的 EventHandler 对象,并调用 该对象的 dispatch 方法。这样就完成了一次异步事件调用。

创建 AdminService

  1. protected AdminService createAdminService() {
  2. return new AdminService(this);
  3. }

设置 Yarn Configuration

  1. rmContext.setYarnConfiguration(conf);
  2. // 调用了
  3. public void setYarnConfiguration(Configuration yarnConfiguration) {
  4. serviceContext.setYarnConfiguration(yarnConfiguration);
  5. }

创建并初始化 Active Service

  1. protected void createAndInitActiveServices(boolean fromActive) {
  2. activeServices = new RMActiveServices(this);
  3. activeServices.fromActive = fromActive;
  4. activeServices.init(conf);
  5. }
  6. // 其中,init 方法如下
  7. @Override
  8. public void init(Configuration conf) {
  9. if (conf == null) {
  10. throw new ServiceStateException("Cannot initialize service "
  11. + getName() + ": null configuration");
  12. }
  13. if (isInState(STATE.INITED)) {
  14. return;
  15. }
  16. synchronized (stateChangeLock) {
  17. if (enterState(STATE.INITED) != STATE.INITED) {
  18. setConfig(conf);
  19. try {
  20. serviceInit(config);
  21. if (isInState(STATE.INITED)) {
  22. //if the service ended up here during init,
  23. //notify the listeners
  24. notifyListeners();
  25. }
  26. } catch (Exception e) {
  27. noteFailure(e);
  28. ServiceOperations.stopQuietly(LOG, this);
  29. throw ServiceStateException.convert(e);
  30. }
  31. }
  32. }
  33. }
  34. // 调用的 serviceInit 方法如下,后面具体分析
  35.  
  36. @Override
  37. protected void serviceInit(Configuration configuration) throws Exception {
  38. standByTransitionRunnable = new StandByTransitionRunnable();
  39.  
  40. rmSecretManagerService = createRMSecretManagerService();
  41. addService(rmSecretManagerService);
  42.  
  43. containerAllocationExpirer = new ContainerAllocationExpirer(rmDispatcher);
  44. addService(containerAllocationExpirer);
  45. rmContext.setContainerAllocationExpirer(containerAllocationExpirer);
  46.  
  47. AMLivelinessMonitor amLivelinessMonitor = createAMLivelinessMonitor();
  48. addService(amLivelinessMonitor);
  49. rmContext.setAMLivelinessMonitor(amLivelinessMonitor);
  50.  
  51. AMLivelinessMonitor amFinishingMonitor = createAMLivelinessMonitor();
  52. addService(amFinishingMonitor);
  53. rmContext.setAMFinishingMonitor(amFinishingMonitor);
  54.  
  55. RMAppLifetimeMonitor rmAppLifetimeMonitor = createRMAppLifetimeMonitor();
  56. addService(rmAppLifetimeMonitor);
  57. rmContext.setRMAppLifetimeMonitor(rmAppLifetimeMonitor);
  58.  
  59. RMNodeLabelsManager nlm = createNodeLabelManager();
  60. nlm.setRMContext(rmContext);
  61. addService(nlm);
  62. rmContext.setNodeLabelManager(nlm);
  63.  
  64. AllocationTagsManager allocationTagsManager =
  65. createAllocationTagsManager();
  66. rmContext.setAllocationTagsManager(allocationTagsManager);
  67.  
  68. PlacementConstraintManagerService placementConstraintManager =
  69. createPlacementConstraintManager();
  70. addService(placementConstraintManager);
  71. rmContext.setPlacementConstraintManager(placementConstraintManager);
  72.  
  73. // add resource profiles here because it's used by AbstractYarnScheduler
  74. ResourceProfilesManager resourceProfilesManager =
  75. createResourceProfileManager();
  76. resourceProfilesManager.init(conf);
  77. rmContext.setResourceProfilesManager(resourceProfilesManager);
  78.  
  79. RMDelegatedNodeLabelsUpdater delegatedNodeLabelsUpdater =
  80. createRMDelegatedNodeLabelsUpdater();
  81. if (delegatedNodeLabelsUpdater != null) {
  82. addService(delegatedNodeLabelsUpdater);
  83. rmContext.setRMDelegatedNodeLabelsUpdater(delegatedNodeLabelsUpdater);
  84. }
  85.  
  86. recoveryEnabled = conf.getBoolean(YarnConfiguration.RECOVERY_ENABLED,
  87. YarnConfiguration.DEFAULT_RM_RECOVERY_ENABLED);
  88.  
  89. RMStateStore rmStore = null;
  90. if (recoveryEnabled) {
  91. rmStore = RMStateStoreFactory.getStore(conf);
  92. boolean isWorkPreservingRecoveryEnabled =
  93. conf.getBoolean(
  94. YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
  95. YarnConfiguration.DEFAULT_RM_WORK_PRESERVING_RECOVERY_ENABLED);
  96. rmContext
  97. .setWorkPreservingRecoveryEnabled(isWorkPreservingRecoveryEnabled);
  98. } else {
  99. rmStore = new NullRMStateStore();
  100. }
  101.  
  102. try {
  103. rmStore.setResourceManager(rm);
  104. rmStore.init(conf);
  105. rmStore.setRMDispatcher(rmDispatcher);
  106. } catch (Exception e) {
  107. // the Exception from stateStore.init() needs to be handled for
  108. // HA and we need to give up master status if we got fenced
  109. LOG.error("Failed to init state store", e);
  110. throw e;
  111. }
  112. rmContext.setStateStore(rmStore);
  113.  
  114. if (UserGroupInformation.isSecurityEnabled()) {
  115. delegationTokenRenewer = createDelegationTokenRenewer();
  116. rmContext.setDelegationTokenRenewer(delegationTokenRenewer);
  117. }
  118.  
  119. // Register event handler for NodesListManager
  120. nodesListManager = new NodesListManager(rmContext);
  121. rmDispatcher.register(NodesListManagerEventType.class, nodesListManager);
  122. addService(nodesListManager);
  123. rmContext.setNodesListManager(nodesListManager);
  124.  
  125. // Initialize the scheduler
  126. scheduler = createScheduler();
  127. scheduler.setRMContext(rmContext);
  128. addIfService(scheduler);
  129. rmContext.setScheduler(scheduler);
  130.  
  131. schedulerDispatcher = createSchedulerEventDispatcher();
  132. addIfService(schedulerDispatcher);
  133. rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
  134.  
  135. // Register event handler for RmAppEvents
  136. rmDispatcher.register(RMAppEventType.class,
  137. new ApplicationEventDispatcher(rmContext));
  138.  
  139. // Register event handler for RmAppAttemptEvents
  140. rmDispatcher.register(RMAppAttemptEventType.class,
  141. new ApplicationAttemptEventDispatcher(rmContext));
  142.  
  143. // Register event handler for RmNodes
  144. rmDispatcher.register(
  145. RMNodeEventType.class, new NodeEventDispatcher(rmContext));
  146.  
  147. nmLivelinessMonitor = createNMLivelinessMonitor();
  148. addService(nmLivelinessMonitor);
  149.  
  150. resourceTracker = createResourceTrackerService();
  151. addService(resourceTracker);
  152. rmContext.setResourceTrackerService(resourceTracker);
  153.  
  154. MetricsSystem ms = DefaultMetricsSystem.initialize("ResourceManager");
  155. if (fromActive) {
  156. JvmMetrics.reattach(ms, jvmMetrics);
  157. UserGroupInformation.reattachMetrics();
  158. } else {
  159. jvmMetrics = JvmMetrics.initSingleton("ResourceManager", null);
  160. }
  161.  
  162. JvmPauseMonitor pauseMonitor = new JvmPauseMonitor();
  163. addService(pauseMonitor);
  164. jvmMetrics.setPauseMonitor(pauseMonitor);
  165.  
  166. // Initialize the Reservation system
  167. if (conf.getBoolean(YarnConfiguration.RM_RESERVATION_SYSTEM_ENABLE,
  168. YarnConfiguration.DEFAULT_RM_RESERVATION_SYSTEM_ENABLE)) {
  169. reservationSystem = createReservationSystem();
  170. if (reservationSystem != null) {
  171. reservationSystem.setRMContext(rmContext);
  172. addIfService(reservationSystem);
  173. rmContext.setReservationSystem(reservationSystem);
  174. LOG.info("Initialized Reservation system");
  175. }
  176. }
  177.  
  178. masterService = createApplicationMasterService();
  179. createAndRegisterOpportunisticDispatcher(masterService);
  180. addService(masterService) ;
  181. rmContext.setApplicationMasterService(masterService);
  182.  
  183. applicationACLsManager = new ApplicationACLsManager(conf);
  184.  
  185. queueACLsManager = createQueueACLsManager(scheduler, conf);
  186.  
  187. rmAppManager = createRMAppManager();
  188. // Register event handler for RMAppManagerEvents
  189. rmDispatcher.register(RMAppManagerEventType.class, rmAppManager);
  190.  
  191. clientRM = createClientRMService();
  192. addService(clientRM);
  193. rmContext.setClientRMService(clientRM);
  194.  
  195. applicationMasterLauncher = createAMLauncher();
  196. rmDispatcher.register(AMLauncherEventType.class,
  197. applicationMasterLauncher);
  198.  
  199. addService(applicationMasterLauncher);
  200. if (UserGroupInformation.isSecurityEnabled()) {
  201. addService(delegationTokenRenewer);
  202. delegationTokenRenewer.setRMContext(rmContext);
  203. }
  204.  
  205. if(HAUtil.isFederationEnabled(conf)) {
  206. String cId = YarnConfiguration.getClusterId(conf);
  207. if (cId.isEmpty()) {
  208. String errMsg =
  209. "Cannot initialize RM as Federation is enabled"
  210. + " but cluster id is not configured.";
  211. LOG.error(errMsg);
  212. throw new YarnRuntimeException(errMsg);
  213. }
  214. federationStateStoreService = createFederationStateStoreService();
  215. addIfService(federationStateStoreService);
  216. LOG.info("Initialized Federation membership.");
  217. }
  218.  
  219. new RMNMInfo(rmContext, scheduler);
  220.  
  221. if (conf.getBoolean(YarnConfiguration.YARN_API_SERVICES_ENABLE,
  222. false)) {
  223. SystemServiceManager systemServiceManager = createServiceManager();
  224. addIfService(systemServiceManager);
  225. }
  226.  
  227. super.serviceInit(conf);
  228. }

获取 yarn wenApp地址

  1. // yarn.resourcemanager.bind-host 可以根据这个参数来动态指定 RM HOST
  2. webAppAddress = WebAppUtils.getWebAppBindURL(this.conf,
  3. YarnConfiguration.RM_BIND_HOST,
  4. WebAppUtils.getRMWebAppURLWithoutScheme(this.conf));

创建 RMApplicationHistoryWriter 服务

  1. protected RMApplicationHistoryWriter createRMApplicationHistoryWriter() {
  2. return new RMApplicationHistoryWriter();
  3. }
  4.  
  5. RMApplicationHistoryWriter rmApplicationHistoryWriter =
  6. createRMApplicationHistoryWriter();
  7. addService(rmApplicationHistoryWriter);
  8. rmContext.setRMApplicationHistoryWriter(rmApplicationHistoryWriter);

创建 RM timeline collector

  1. private RMTimelineCollectorManager createRMTimelineCollectorManager() {
  2. return new RMTimelineCollectorManager(this);
  3. }
  4.  
  5. if (YarnConfiguration.timelineServiceV2Enabled(this.conf)) {
  6. RMTimelineCollectorManager timelineCollectorManager =
  7. createRMTimelineCollectorManager();
  8. addService(timelineCollectorManager);
  9. rmContext.setRMTimelineCollectorManager(timelineCollectorManager);
  10. }

设置 SystemMetricsPublisher

  1. protected SystemMetricsPublisher createSystemMetricsPublisher() {
  2. List<SystemMetricsPublisher> publishers =
  3. new ArrayList<SystemMetricsPublisher>();
  4. // 使用 v1
  5. if (YarnConfiguration.timelineServiceV1Enabled(conf)) {
  6. SystemMetricsPublisher publisherV1 = new TimelineServiceV1Publisher();
  7. publishers.add(publisherV1);
  8. }
  9. // 使用 v2
  10. if (YarnConfiguration.timelineServiceV2Enabled(conf)) {
  11. // we're dealing with the v.2.x publisher
  12. LOG.info("system metrics publisher with the timeline service V2 is "
  13. + "configured");
  14. SystemMetricsPublisher publisherV2 = new TimelineServiceV2Publisher(
  15. rmContext.getRMTimelineCollectorManager());
  16. publishers.add(publisherV2);
  17. }
  18. // 如果没有 publisher, 给一个 空的 publisher,这里运用了null object 模式,防止了空指针的出现。
  19. if (publishers.isEmpty()) {
  20. LOG.info("TimelineServicePublisher is not configured");
  21. SystemMetricsPublisher noopPublisher = new NoOpSystemMetricPublisher();
  22. publishers.add(noopPublisher);
  23. }
  24.  
  25. for (SystemMetricsPublisher publisher : publishers) {
  26. addIfService(publisher);
  27. }
  28.  
  29. SystemMetricsPublisher combinedPublisher =
  30. new CombinedSystemMetricsPublisher(publishers);
  31. return combinedPublisher;
  32. }

注册 JMX

  1. /**
  2. * Register ResourceManagerMXBean.
  3. */
  4. private void registerMXBean() {
  5. MBeans.register("ResourceManager", "ResourceManager", this);
  6. }

调用父类的服务 init 方法

  1. // 在这里,之前初始化过程中创建的任何被加入到服务列表中的服务,都会被初始化。
  2. protected void serviceInit(Configuration conf) throws Exception {
  3. List<Service> services = getServices();
  4. if (LOG.isDebugEnabled()) {
  5. LOG.debug(getName() + ": initing services, size=" + services.size());
  6. }
  7. for (Service service : services) {
  8. service.init(conf);
  9. }
  10. super.serviceInit(conf);
  11. }
  12. // 奇怪,为什么不直接 返回呢?ArrayList 的构造方法里面做的事就是 Arrays.copyOf 的工作(浅拷贝),防止了外部应用更新或删除服务列表。这是一个建议的做法,还可以返回一个 iterator 对象
  13. public List<Service> getServices() {
  14. synchronized (serviceList) {
  15. return new ArrayList<Service>(serviceList);
  16. }
  17. }

至此,初始化的大致代码,基本上走完了,后续涉及到哪部分代码,再回来具体看。

YARN分析系列之三 -- 从脚本入口分析 ResourceManager的初始化过程的更多相关文章

  1. Spring Ioc源码分析系列--Ioc源码入口分析

    Spring Ioc源码分析系列--Ioc源码入口分析 本系列文章代码基于Spring Framework 5.2.x 前言 上一篇文章Spring Ioc源码分析系列--Ioc的基础知识准备介绍了I ...

  2. Spring框架系列(13) - SpringMVC实现原理之DispatcherServlet的初始化过程

    前文我们有了IOC的源码基础以及SpringMVC的基础,我们便可以进一步深入理解SpringMVC主要实现原理,包含DispatcherServlet的初始化过程和DispatcherServlet ...

  3. MongoDB源码分析——mongod程序源码入口分析

    Edit 说明:第一次写笔记,之前都是看别人写的,觉得很简单,开始写了之后才发现真的很难,不知道该怎么分析,这篇文章也参考了很多前辈对MongoDB源码的分析,也有一些自己的理解,后续将会继续分析其他 ...

  4. Thinkphp源码分析系列(一)–入口文件

    正如官方文档上所介绍的,thinkphp使用单一入口,所有的请求都从默认的index.php文件进入.当然不是说一定非得从index.php进入,这应该取决于你的服务器配置,一般服务器都会有默认的首页 ...

  5. nova创建虚拟机源码分析系列之三 PasteDeploy

    上一篇博文介绍WSGI在nova创建虚拟机过程的作用是解析URL,是以一个最简单的例子去给读者有一个印象.在openstack中URL复杂程度也大大超过上一个例子.所以openstack使用了Past ...

  6. Java内存泄漏分析系列之三:jstat命令的使用及VM Thread分析

    原文地址:http://www.javatang.com 使用jstat命令 当服务器CPU100%的时候,通过定位占用资源最大的线程定位到 VM Thread: "VM Thread&qu ...

  7. Dubbo 源码分析系列之三 —— 架构原理

    1 核心功能 首先要了解Dubbo提供的三大核心功能: Remoting:远程通讯 提供对多种NIO框架抽象封装,包括"同步转异步"和"请求-响应"模式的信息交 ...

  8. 网站安全分析:恶意DOS脚本日志分析报告

    http://www.chinaz.com/web/2012/0820/270205.shtml http://www.searchdatacenter.com.cn/showcontent_5817 ...

  9. Java源码分析系列之HttpServletRequest源码分析

    从源码当中 我们可以 得知,HttpServletRequest其实 实际上 并 不是一个类,它只是一个标准,一个 接口而已,它的 父类是ServletRequest. 认证方式 public int ...

随机推荐

  1. 数据批量插入MSSQL

    MSSQL数据批量插入优化详细   序言 现在有一个需求是将10w条数据插入到MSSQL数据库中,表结构如下,你会怎么做,你感觉插入10W条数据插入到MSSQL如下的表中需要多久呢? 或者你的批量数据 ...

  2. Java--面试通关要点

    这里,笔者结合自己过往的面试经验,整理了一些核心的知识清单,帮助读者更好地回顾与复习 Java 服务端核心技术.本文会以引出问题为主,后面有时间的话,笔者陆续会抽些重要的知识点进行详细的剖析与解答. ...

  3. Method and apparatus for training a memory signal via an error signal of a memory

    Described herein is a method and an apparatus for training a memory signal via an error signal of a ...

  4. WPF 控制程序只能启动一次

    原文:WPF 控制程序只能启动一次 版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/jsyhello/article/details/7411898 ...

  5. vcmi(魔法门英雄无敌3 - 开源复刻版) 源码编译

    vcmi源码编译 windows+cmake+mingw ##1 准备 HoMM3 gog.com CMake 官网 vcmi 源码 下载 QT5 with mingw 官网 Boost 源码1.55 ...

  6. windows 的使用 —— 注册表(软件的安装和卸载)

    win + r(run):输入 regedit(register edit)进入: 1. 网络连接 比如一些 vpn 安装之后,会对网络连接进行一定的修改,这样在 vpn 工具删除之后,仍然无法消除修 ...

  7. 如何诊断rac环境sysdate 返回的时间错误

    最近处理了若干rac环境访问sysdate错误的时间返回.而这个问题通常是一个数据库链接是由现在Listener创建的情况下.并且.大部分情况下都是和时区设置相关的.在这篇文章中我们会针对怎样诊断这样 ...

  8. ATS项目更新(2) 命令行编译Studio解决方案

    1: rem "D:\Microsoft Visual Studio 8\SDK\v2.0\Bin\sdkvars.bat" 2: D: 3: cd ..\..\..\..\..\ ...

  9. windows安装Oracle10G

     1.解压文件10201_database_win32.zip.并双击解压文件夹下的setup.exe,出现安装界面,例如以下: 输入口令和确认口令.如:password,点击下一步,出现例如以下 ...

  10. [bug系列]Method not found: 'Void Microsoft.EntityFrameworkCore.Storage.Internal.RelationalCommandBuilderFactory

    bug由来 最近开始学习NetCore,想通过实战使用NetCore做一个集成数据库存储Redis缓存的WebApi项目,由于MSSQL的庞大体积,最终决定使用轻量级关系型数据库MySql. 所以最终 ...