Flume-NG启动过程源码分析（三）(原创)

　　上一篇文章分析了Flume如何加载配置文件的，动态加载也只是重复运行getConfiguration()。

　　本篇分析加载配置文件后各个组件是如何运行的？

　　加载完配置文件订阅者Application类会收到订阅信息执行：

  @Subscribe

  public synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {

    stopAllComponents();

    startAllComponents(conf);

  }

　　MaterializedConfiguration conf就是getConfiguration()方法获取的配置信息，是SimpleMaterializedConfiguration的一个实例。

　　handleConfigurationEvent方法在前面章节(一)中有过大致分析，包括：stopAllComponents()和startAllComponents(conf)。Application中的materializedConfiguration就是MaterializedConfiguration conf，stopAllComponents()方法中的materializedConfiguration是旧的配置信息，需要先停掉旧的组件，然后startAllComponents(conf)将新的配置信息赋给materializedConfiguration并依次启动各个组件。

　　1、先看startAllComponents(conf)方法。代码如下：

private void startAllComponents(MaterializedConfiguration materializedConfiguration) {//启动所有组件最基本的三大组件

    logger.info("Starting new configuration:{}", materializedConfiguration);

    this.materializedConfiguration = materializedConfiguration;

    for (Entry<String, Channel> entry :

      materializedConfiguration.getChannels().entrySet()) {

      try{

        logger.info("Starting Channel " + entry.getKey());

        supervisor.supervise(entry.getValue(),

            new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);

      } catch (Exception e){

        logger.error("Error while starting {}", entry.getValue(), e);

      }

    }

    /*

     * Wait for all channels to start.等待所有channel启动完毕

     */

    for(Channel ch: materializedConfiguration.getChannels().values()){

      while(ch.getLifecycleState() != LifecycleState.START

          && !supervisor.isComponentInErrorState(ch)){

        try {

          logger.info("Waiting for channel: " + ch.getName() +

              " to start. Sleeping for 500 ms");

          Thread.sleep(500);

        } catch (InterruptedException e) {

          logger.error("Interrupted while waiting for channel to start.", e);

          Throwables.propagate(e);

        }

      }

    }

    for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners()

        .entrySet()) {        //启动所有sink

      try{

        logger.info("Starting Sink " + entry.getKey());

        supervisor.supervise(entry.getValue(),

          new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);

      } catch (Exception e) {

        logger.error("Error while starting {}", entry.getValue(), e);

      }

    }

    for (Entry<String, SourceRunner> entry : materializedConfiguration

        .getSourceRunners().entrySet()) {//启动所有source

      try{

        logger.info("Starting Source " + entry.getKey());

        supervisor.supervise(entry.getValue(),

          new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);

      } catch (Exception e) {

        logger.error("Error while starting {}", entry.getValue(), e);

      }

    }

    this.loadMonitoring();

  }

　　三大组件都是通过supervisor.supervise(entry.getValue(),new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START)启动的，其中，channel启动之后还要待所有的channel完全启动完毕之后才可再去启动sink和source。如果channel没有启动完毕就去启动另外俩组件，会出现错误，以为一旦sink或者source建立完毕就会立即与channel通信获取数据。稍后会分别分析sink和source的启动。

　　supervisor是LifecycleSupervisor的一个对象，该类的构造方法会构造一个有10个线程，上限是20的线程池供各大组件使用。构造方法如下：

public LifecycleSupervisor() {

    lifecycleState = LifecycleState.IDLE;

    supervisedProcesses = new HashMap<LifecycleAware, Supervisoree>();//存储所有历史上的组件及其监控信息

    monitorFutures = new HashMap<LifecycleAware, ScheduledFuture<?>>();

    monitorService = new ScheduledThreadPoolExecutor(10,

        new ThreadFactoryBuilder().setNameFormat(

            "lifecycleSupervisor-" + Thread.currentThread().getId() + "-%d")

            .build());

    monitorService.setMaximumPoolSize(20);

    monitorService.setKeepAliveTime(30, TimeUnit.SECONDS);

    purger = new Purger();

    needToPurge = false;

  }

　　supervise(LifecycleAware lifecycleAware,SupervisorPolicy policy, LifecycleState desiredState)方法则是具体执行启动各个组件的方法。flume的所有组件均实现自

LifecycleAware 接口，如图：，这个接口就三个方法getLifecycleState(返回组件运行状态)、start(组件启动)、stop(停止组件)。supervise方法代码如下：

public synchronized void supervise(LifecycleAware lifecycleAware,

      SupervisorPolicy policy, LifecycleState desiredState) {
　　//检查线程池状态

    if(this.monitorService.isShutdown()

        || this.monitorService.isTerminated()

        || this.monitorService.isTerminating()){

      throw new FlumeException("Supervise called on " + lifecycleAware + " " +

          "after shutdown has been initiated. " + lifecycleAware + " will not" +

          " be started");

    }

　　//如果该组件已经在监控，则拒绝二次监控

    Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware),

        "Refusing to supervise " + lifecycleAware + " more than once");

    if (logger.isDebugEnabled()) {

      logger.debug("Supervising service:{} policy:{} desiredState:{}",

          new Object[] { lifecycleAware, policy, desiredState });

    }

　　//新的组件

    Supervisoree process = new Supervisoree();

    process.status = new Status();

    process.policy = policy;

    process.status.desiredState = desiredState;

    process.status.error = false;

    MonitorRunnable monitorRunnable = new MonitorRunnable();

    monitorRunnable.lifecycleAware = lifecycleAware;//组件

    monitorRunnable.supervisoree = process;

    monitorRunnable.monitorService = monitorService;

    supervisedProcesses.put(lifecycleAware, process);

    //创建并执行一个在给定初始延迟后首次启用的定期操作，随后，在每一次执行终止和下一次执行开始之间都存在给定的延迟。如果任务的任一执行遇到异常，就会取消后续执行。

    ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(

        monitorRunnable, 0, 3, TimeUnit.SECONDS);  //启动MonitorRunnable,结束之后3秒再重新启动，可以用于重试

    monitorFutures.put(lifecycleAware, future);

  }

　　该方法首先monitorService是否是正常运行状态；然后构造Supervisoree process = new Supervisoree()，进行赋值并构造一个监控进程MonitorRunnable，放入线程池去执行。

　　MonitorRunnable.run()方法：

public void run() {

      logger.debug("checking process:{} supervisoree:{}", lifecycleAware,

          supervisoree);

      long now = System.currentTimeMillis();//获取现在的时间戳

      try {

        if (supervisoree.status.firstSeen == null) {

          logger.debug("first time seeing {}", lifecycleAware);

　　　　　　//如果这个组件是是初次受监控

          supervisoree.status.firstSeen = now;

        }

　　　　　//如果这个组件已经监控过

        supervisoree.status.lastSeen = now;

        synchronized (lifecycleAware) {//锁住组件

          if (supervisoree.status.discard) {//该组件已经停止监控

            // Unsupervise has already been called on this.

            logger.info("Component has already been stopped {}", lifecycleAware);

            return;//直接返回

          } else if (supervisoree.status.error) {//该组件是错误状态

            logger.info("Component {} is in error state, and Flume will not"

                + "attempt to change its state", lifecycleAware);

            return;//直接返回

          }

          supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();//获取组件最新状态,没运行start()方法之前是LifecycleState.IDLE状态

          if (!lifecycleAware.getLifecycleState().equals(

              supervisoree.status.desiredState)) {//该组件最新状态和期望的状态不一致

            logger.debug("Want to transition {} from {} to {} (failures:{})",

                new Object[] { lifecycleAware, supervisoree.status.lastSeenState,

                    supervisoree.status.desiredState,

                    supervisoree.status.failures });

            switch (supervisoree.status.desiredState) {//根据状态执行相应的操作

              case START:

                try {

                  lifecycleAware.start();   //启动组件，同时其状态也会变为LifecycleState.START

                } catch (Throwable e) {

                  logger.error("Unable to start " + lifecycleAware

                      + " - Exception follows.", e);

                  if (e instanceof Error) {

                    // This component can never recover, shut it down.

                    supervisoree.status.desiredState = LifecycleState.STOP;

                    try {

                      lifecycleAware.stop();

                      logger.warn("Component {} stopped, since it could not be"

                          + "successfully started due to missing dependencies",

                          lifecycleAware);

                    } catch (Throwable e1) {

                      logger.error("Unsuccessful attempt to "

                          + "shutdown component: {} due to missing dependencies."

                          + " Please shutdown the agent"

                          + "or disable this component, or the agent will be"

                          + "in an undefined state.", e1);

                      supervisoree.status.error = true;

                      if (e1 instanceof Error) {

                        throw (Error) e1;

                      }

                      // Set the state to stop, so that the conf poller can

                      // proceed.

                    }

                  }

                  supervisoree.status.failures++;//启动错误失败次数+1

                }

                break;

              case STOP:

                try {

                  lifecycleAware.stop();　　　　//停止组件

                } catch (Throwable e) {

                  logger.error("Unable to stop " + lifecycleAware

                      + " - Exception follows.", e);

                  if (e instanceof Error) {

                    throw (Error) e;

                  }

                  supervisoree.status.failures++;　　//组件停止错误，错误次数+1

                }

                break;

              default:

                logger.warn("I refuse to acknowledge {} as a desired state",

                    supervisoree.status.desiredState);

            }

　　　　　　　//两种SupervisorPolicy(AlwaysRestartPolicy和OnceOnlyPolicy)后者还未使用过，前者表示可以重新启动的组件，后者表示只能运行一次的组件

            if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {

              logger.error(

                  "Policy {} of {} has been violated - supervisor should exit!",

                  supervisoree.policy, lifecycleAware);

            }

          }

        }

      } catch(Throwable t) {

        logger.error("Unexpected error", t);

      }

      logger.debug("Status check complete");

    }

　　上面的 lifecycleAware.stop()和lifecycleAware.start()就是执行的sink、source、channel等的对应方法。

　　这里的start需要注意如果是channel则是直接执行start方法；如果是sink或者PollableSource的实现类，则会在start()方法中启动一个线程来循环的调用process()方法来从channel拿数据(sink)或者向channel送数据(source);如果是EventDrivenSource的实现类，则没有process()方法，通过执行start()来执行想channel中送数据的操作(可以在此添加线程来实现相应的逻辑)。

　　2、stopAllComponents()方法。顾名思义，就是停止所有组件的方法。该方法代码如下：

private void stopAllComponents() {

    if (this.materializedConfiguration != null) {

      logger.info("Shutting down configuration: {}", this.materializedConfiguration);

      for (Entry<String, SourceRunner> entry : this.materializedConfiguration

          .getSourceRunners().entrySet()) {

        try{

          logger.info("Stopping Source " + entry.getKey());

          supervisor.unsupervise(entry.getValue());

        } catch (Exception e){

          logger.error("Error while stopping {}", entry.getValue(), e);

        }

      }

      for (Entry<String, SinkRunner> entry :

        this.materializedConfiguration.getSinkRunners().entrySet()) {

        try{

          logger.info("Stopping Sink " + entry.getKey());

          supervisor.unsupervise(entry.getValue());

        } catch (Exception e){

          logger.error("Error while stopping {}", entry.getValue(), e);

        }

      }

      for (Entry<String, Channel> entry :

        this.materializedConfiguration.getChannels().entrySet()) {

        try{

          logger.info("Stopping Channel " + entry.getKey());

          supervisor.unsupervise(entry.getValue());

        } catch (Exception e){

          logger.error("Error while stopping {}", entry.getValue(), e);

        }

      }

    }

    if(monitorServer != null) {

      monitorServer.stop();

    }

  }

　　首先，需要注意的是，stopAllComponents()放在startAllComponents(MaterializedConfiguration materializedConfiguration)方法之前的原因，由于配置文件的动态加载这一特性的存在，使得每次加载之前都要先把旧的组件停掉，然后才能去加载最新配置文件中的配置；

　　其次，首次执行stopAllComponents()时，由于配置文件尚未赋值，所以并不会执行停止所有组件的操作以及停止monitorServer。再次加载时会依照顺序依次停止对source、sink以及channel的监控，通过supervisor.unsupervise(entry.getValue())停止对其的监控，然后停止monitorServer。supervisor.unsupervise方法如下：

public synchronized void unsupervise(LifecycleAware lifecycleAware) {

    Preconditions.checkState(supervisedProcesses.containsKey(lifecycleAware),

        "Unaware of " + lifecycleAware + " - can not unsupervise");

    logger.debug("Unsupervising service:{}", lifecycleAware);

    synchronized (lifecycleAware) {

    Supervisoree supervisoree = supervisedProcesses.get(lifecycleAware);

    supervisoree.status.discard = true;

      this.setDesiredState(lifecycleAware, LifecycleState.STOP);

      logger.info("Stopping component: {}", lifecycleAware);

      lifecycleAware.stop();

    }

    supervisedProcesses.remove(lifecycleAware);

    //We need to do this because a reconfiguration simply unsupervises old

    //components and supervises new ones.

    monitorFutures.get(lifecycleAware).cancel(false);

    //purges are expensive, so it is done only once every 2 hours.

    needToPurge = true;

    monitorFutures.remove(lifecycleAware);

  }

　　该方法首先会检查正在运行的组件当中是否有此组件supervisedProcesses.containsKey(lifecycleAware)；如果存在，则对此组件标记为已取消监控supervisoree.status.discard = true；将状态设置为STOP，并停止组件lifecycleAware.stop()；然后从删除此组件的监控记录，包括从记录正在处于监控的组件的结构supervisedProcesses以及记录组件及其对应的运行线程的结构monitorFutures中删除相应的组件信息，并且needToPurge = true会使得两小时执行一次的线程池清理操作。

　　有一个问题就是，sink和source是如何找到对应的channel的呢？？其实前面章节就已经讲解过，分别在AbstractConfigurationProvider.loadSources方法中通过ChannelSelector配置source对应的channel，而在source中通过getChannelProcessor()获取channels,通过channelProcessor.processEventBatch(eventList)将events发送到channel中；而在AbstractConfigurationProvider.loadSinks方法中sink.setChannel(channelComponent.channel)来设置此sink对应的channel，然后在sink的实现类中通过getChannel()获取设置的channel，并使用channel.take()从channel中获取event进行处理。

　　以上三节是Flume-NG的启动、配置文件的加载、配置文件的动态加载、组件的执行的整个流程。文中的疏漏之处，请各位指教，我依然会后续继续完善这些内容的。

　　后续还有更精彩的章节。。。。

Flume-NG启动过程源码分析（三）(原创)的更多相关文章

scrapy 源码解析（三）：启动流程源码分析(三) ExecutionEngine执行引擎
ExecutionEngine执行引擎上一篇分析了CrawlerProcess和Crawler对象的建立过程,在最终调用CrawlerProcess.start()之前,会首先建立Execution ...
Flume-NG启动过程源码分析（二）(原创)
在上一节中讲解了——Flume-NG启动过程源码分析(一)(原创) 本节分析配置文件的解析,即PollingPropertiesFileConfigurationProvider.FileWatch ...
Android Content Provider的启动过程源码分析
本文參考Android应用程序组件Content Provider的启动过程源码分析http://blog.csdn.net/luoshengyang/article/details/6963418和 ...
10.4 android输入系统_框架、编写一个万能模拟输入驱动程序、reader/dispatcher线程启动过程源码分析
1. 输入系统框架 android输入系统官方文档 // 需FQhttp://source.android.com/devices/input/index.html <深入理解Android 卷 ...
Spark（五十一）：Spark On YARN（Yarn-Cluster模式）启动流程源码分析（二）
上篇<Spark(四十九):Spark On YARN启动流程源码分析(一)>我们讲到启动SparkContext初始化,ApplicationMaster启动资源中,讲解的内容明显不完整 ...
Spark（四十九）：Spark On YARN启动流程源码分析（一）
引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数一般提交一个spark作业的方式采用spark-submit来提交 ...
Activity启动过程源码分析（Android 8.0）
Activity启动过程源码分析本文来Activity的启动流程,一般我们都是通过startActivity或startActivityForResult来启动目标activity,那么我们就由此出 ...
Netty入门一：服务端应用搭建 & 启动过程源码分析
最近周末也没啥事就学学Netty,同时打算写一些博客记录一下(写的过程理解更加深刻了) 本文主要从三个方法来呈现:Netty核心组件简介.Netty服务端创建.Netty启动过程源码分析如果你对Ne ...
Spring启动过程源码分析基本概念
Spring启动过程源码分析基本概念本文是通过AnnotationConfigApplicationContext读取配置类来一步一步去了解Spring的启动过程. 在看源码之前,我们要知道某些类的 ...
Android系统默认Home应用程序（Launcher）的启动过程源码分析
在前面一篇文章中,我们分析了Android系统在启动时安装应用程序的过程,这些应用程序安装好之后,还须要有一个Home应用程序来负责把它们在桌面上展示出来,在Android系统中,这个默认的Home应 ...

随机推荐

Cocos2d-x Lua中实例：帧动画使用
下面我们通过一个实例介绍一下帧动画的使用,这个实例如下图所示,点击Go按钮开始播放动画,这时候播放按钮标题变为Stop,点击Stop按钮可以停止播放动画. 帧动画实例下面我们再看看具体的程序代码,首 ...
1400 序列分解(dfs)
1400 序列分解基准时间限制:1 秒空间限制:131072 KB 分值: 40 难度:4级算法题小刀和大刀是双胞胎兄弟.今天他们玩一个有意思的游戏. 大刀给小刀准备了一个长度为n的整数序列.小 ...
jQuery插件——1.编写规则
jQuery插件编写规则如下: 1.命名规则:jquery.[插件名称].js 2.所有对象方法都应当附加到jQuery.fn对象上:所有的全局方法都应当附加到jQuery对象上. 3.在插件内部,t ...
Bootstrap支持的JavaScript插件
1.导入JavaScript插件 Bootstrap除了包含丰富的Web组件之外,如前面介绍的下拉菜单.按钮组.导航.分页等.他还包括一些JavaScript的插件. Bootstrap的JavaSc ...
检测当前的语言环境是否使用了 UTF-8 编码（三篇文章：先用setlocale()设置编码，再用nl_langinfo()进行检测。locale对象可以使用langLocale.name() == "zh_CN"判断）
C/C++程序中,locale(即系统区域设置,即国家或地区设置)将决定程序所使用的当前语言编码.日期格式.数字格式及其它与区域有关的设置,locale设置的正确与否将影响到程序中字符串处理(wcha ...
MySQL二进制包安装简略过程
l 软件目录 [root@MASTER_03 ~]# mkdir -pv /data/software [root@MASTER_03 ~]# cd /data/software/ [root@MA ...
Maven学习笔记—坐标和依赖
Maven的坐标和依赖 1 Maven坐标 1.1 什么是Maven坐标 Maven坐标:世界上任何一组构件都可以使用Maven坐标来唯一标识,Maven坐标的元素包括groupId.artifact ...
Python3 进程线程同步锁线程死锁和递归锁
进程是最小的资源单位,线程是最小的执行单位一.进程进程:就是一个程序在一个数据集上的一次动态执行过程. 进程由三部分组成: 1.程序:我们编写的程序用来描述进程要完成哪些功能以及如何完成 2.数据 ...
web前端基础知识学习网站推介
内容:一．基础知识及学习资料1. HTML入门学习:http://www.w3school.com.cn/html/index.aspHTML5 入门学习:http://www.w3school.co ...
移植opencv2.4.9到android过程记录
http://blog.csdn.net/brightming/article/details/50606463 在移植到arm开发板的时候已经说过,OpenCV已经为各平台准备了一套cmake交叉编 ...

Flume-NG启动过程源码分析（三）(原创)

Flume-NG启动过程源码分析（三）(原创)的更多相关文章

随机推荐

热门专题