Heritrix 3.1.0 源码解析（三十七）

今天有兴趣重新看了一下heritrix3.1.0系统里面的线程池源码，heritrix系统没有采用java的cocurrency包里面的并发框架，而是采用了线程组ThreadGroup类来实现线程池的（线程组类似于树结构，一个线程组包含多个子线程组或多个子线程，数据结构类似于composite模式，不过枝节点与叶子节点没有实现类似composite模式的共同接口）

关键类是org.archive.crawler.framework包里面的ToePool类与ToeThread类，前者继承自ThreadGroup类,后者继承自Thread类

ToeThread显然是工作线程，用于执行采集任务，构造函数初始化成员变量CrawlController controller，用于获取Frontier对象及相关处理器链

    private CrawlController controller; 
    private String coreName;

    private CrawlURI currentCuri;   /**

     * Create a ToeThread

     *

     * @param g ToeThreadGroup

     * @param sn serial number

     */

    public ToeThread(ToePool g, int sn) {

        // TODO: add crawl name?

        super(g,"ToeThread #" + sn);

        coreName="ToeThread #" + sn + ": ";

        controller = g.getController();

        serialNumber = sn;

        setPriority(DEFAULT_PRIORITY);

        int outBufferSize = controller.getRecorderOutBufferBytes();

        int inBufferSize = controller.getRecorderInBufferBytes();

        httpRecorder = new Recorder(controller.getScratchDir().getFile(),

            "tt" + sn + "http", outBufferSize, inBufferSize);

        lastFinishTime = System.currentTimeMillis();

    }

    /** (non-Javadoc)

     * @see java.lang.Thread#run()

     */

    public void run() {

        String name = controller.getMetadata().getJobName();

        logger.fine(getName()+" started for order '"+name+"'");

        Recorder.setHttpRecorder(httpRecorder); 

        try {

            while ( true ) {

                ArchiveUtils.continueCheck();

                setStep(Step.ABOUT_TO_GET_URI, null);

                CrawlURI curi = controller.getFrontier().next();

                synchronized(this) {

                    ArchiveUtils.continueCheck();

                    setCurrentCuri(curi);

                    currentCuri.setThreadNumber(this.serialNumber);

                    lastStartTime = System.currentTimeMillis();

                    currentCuri.setRecorder(httpRecorder);

                }

                try {

                    KeyedProperties.loadOverridesFrom(curi);

                    controller.getFetchChain().process(curi,this);

                    controller.getFrontier().beginDisposition(curi);

                    controller.getDispositionChain().process(curi,this);

                } catch (RuntimeExceptionWrapper e) {

                    // Workaround to get cause from BDB

                    if(e.getCause() == null) {

                        e.initCause(e.getCause());

                    }

                    recoverableProblem(e);

                } catch (AssertionError ae) {

                    // This risks leaving crawl in fatally inconsistent state,

                    // but is often reasonable for per-Processor assertion problems

                    recoverableProblem(ae);

                } catch (RuntimeException e) {

                    recoverableProblem(e);

                } catch (InterruptedException e) {

                    if(currentCuri!=null) {

                        recoverableProblem(e);

                        Thread.interrupted(); // clear interrupt status

                    } else {

                        throw e;

                    }

                } catch (StackOverflowError err) {

                    recoverableProblem(err);

                } catch (Error err) {

                    // OutOfMemory and any others

                    seriousError(err);

                } finally {

                    httpRecorder.endReplays();

                    KeyedProperties.clearOverridesFrom(curi);

                }

                setStep(Step.ABOUT_TO_RETURN_URI, null);

                ArchiveUtils.continueCheck();

                synchronized(this) {

                    controller.getFrontier().finished(currentCuri);

                    controller.getFrontier().endDisposition();

                    setCurrentCuri(null);

                }

                curi = null;

                setStep(Step.FINISHING_PROCESS, null);

                lastFinishTime = System.currentTimeMillis();

                if(shouldRetire) {

                    break; // from while(true)

                }

            }

        } catch (InterruptedException e) {

            if(currentCuri!=null){

                logger.log(Level.SEVERE,"Interrupt leaving unfinished CrawlURI "+getName()+" - job may hang",e);

            }

            // thread interrupted, ok to end

            logger.log(Level.FINE,this.getName()+ " ended with Interruption");

        } catch (Exception e) {

            // everything else (including interruption)

            logger.log(Level.SEVERE,"Fatal exception in "+getName(),e);

        } catch (OutOfMemoryError err) {

            seriousError(err);

        } finally {

            controller.getFrontier().endDisposition();

        }

        setCurrentCuri(null);

        // Do cleanup so that objects can be GC.

        this.httpRecorder.closeRecorders();

        this.httpRecorder = null;

        logger.fine(getName()+" finished for order '"+name+"'");

        setStep(Step.FINISHED, null);

        controller = null;

    }

ToePool是线程组，用于管理上面的工作线程，初始化、查看活动线程、中断或终止工作线程等

protected CrawlController controller;

    protected int nextSerialNumber = 1;

    protected int targetSize = 0; 

    /**

     * Constructor. Creates a pool of ToeThreads.

     *

     * @param c A reference to the CrawlController for the current crawl.

     */

    public ToePool(AlertThreadGroup atg, CrawlController c) {

        //传入父线程组

        super(atg, "ToeThreads");

        this.controller = c;

        setDaemon(true);

    }

    public void cleanup() {

        // force all Toes waiting on queues, etc to proceed

        Thread[] toes = getToes();

        for(Thread toe : toes) {

            if(toe!=null) {

                toe.interrupt();

            }

        }

//        this.controller = null;

    }

    /**

     * @return The number of ToeThreads that are not available (Approximation).

     */

    public int getActiveToeCount() {

        Thread[] toes = getToes();

        int count = 0;

        for (int i = 0; i < toes.length; i++) {

            if((toes[i] instanceof ToeThread) &&

                    ((ToeThread)toes[i]).isActive()) {

                count++;

            }

        }

        return count;

    }

    /**

     * @return The number of ToeThreads. This may include killed ToeThreads

     *         that were not replaced.

     */

    public int getToeCount() {

        Thread[] toes = getToes();

        int count = 0;

        for (int i = 0; i<toes.length; i++) {

            if((toes[i] instanceof ToeThread)) {

                count++;

            }

        }

        return count;

    }

    //获取活动线程数组

    private Thread[] getToes() {

        Thread[] toes = new Thread[activeCount()+10];

        this.enumerate(toes);

        return toes;

    }

    /**

     * Change the number of ToeThreads.

     *

     * @param newsize The new number of ToeThreads.

     */

    public void setSize(int newsize)

    {

        targetSize = newsize;

        int difference = newsize - getToeCount();

        if (difference > 0) {

            // must create threads

            for(int i = 1; i <= difference; i++) {

                //启动线程

                startNewThread();

            }

        } else {

            //退出多余线程

            // must retire extra threads

            int retainedToes = targetSize;

            Thread[] toes = this.getToes();

            for (int i = 0; i < toes.length ; i++) {

                if(!(toes[i] instanceof ToeThread)) {

                    continue;

                }

                retainedToes--;

                if (retainedToes>=0) {

                    continue; // this toe is spared

                }

                // otherwise:

                ToeThread tt = (ToeThread)toes[i];

                tt.retire();

            }

        }

    }

    /**

     * Kills specified thread. Killed thread can be optionally replaced with a

     * new thread.

     *

     * <p><b>WARNING:</b> This operation should be used with great care. It may

     * destabilize the crawler.

     *

     * @param threadNumber Thread to kill

     * @param replace If true then a new thread will be created to take the

     *           killed threads place. Otherwise the total number of threads

     *           will decrease by one.

     */

    public void killThread(int threadNumber, boolean replace){

        Thread[] toes = getToes();

        for (int i = 0; i< toes.length; i++) {

            if(! (toes[i] instanceof ToeThread)) {

                continue;

            }

            ToeThread toe = (ToeThread) toes[i];

            if(toe.getSerialNumber()==threadNumber) {

                toe.kill();

            }

        }

        if(replace){

            // Create a new toe thread to take its place. Replace toe

            startNewThread();

        }

    }

    //锁定，防止并发初始化线程

    private synchronized void startNewThread() {

        ToeThread newThread = new ToeThread(this, nextSerialNumber++);

        newThread.setPriority(DEFAULT_TOE_PRIORITY);

        newThread.start();

    }

public void waitForAll() {

        while (true) try {

            if (isAllAlive(getToes())) {

                return;

            }

            Thread.sleep(1000);

        } catch (InterruptedException e) {

            throw new IllegalStateException(e);

        }

    }

    private static boolean isAllAlive(Thread[] threads) {

        for (Thread t: threads) {

            if ((t != null) && (!t.isAlive())) {

                return false;

            }

        }

        return true;

    }

最后，线程组的初始化及工作线程的相关管理在CrawlController对象的相关方法执行

/**

     * Maximum number of threads processing URIs at the same time.

     */

    int maxToeThreads;

    public int getMaxToeThreads() {

        return maxToeThreads;

    }

    @Value("25")

    public void setMaxToeThreads(int maxToeThreads) {

        this.maxToeThreads = maxToeThreads;

        if(toePool!=null) {

            toePool.setSize(this.maxToeThreads);

        }

    }

private transient ToePool toePool;

/**

     * Called when the last toethread exits.

     */

    protected void completeStop() {

        LOGGER.fine("Entered complete stop.");

        statisticsTracker.getSnapshot(); // ???

        this.reserveMemory = null;

        if (this.toePool != null) {

            this.toePool.cleanup();

        }

        this.toePool = null;

        LOGGER.fine("Finished crawl.");

        try {

            appCtx.stop();

        } catch (RuntimeException re) {

            LOGGER.log(Level.SEVERE,re.getMessage(),re);

        }

        sendCrawlStateChangeEvent(State.FINISHED, this.sExit);

        // CrawlJob needs to be sure all beans have received FINISHED signal before teardown

        this.isStopComplete = true;

        appCtx.publishEvent(new StopCompleteEvent(this));

    }

/**

     * Operator requested for crawl to stop.

     */

    public synchronized void requestCrawlStop() {

        if(state == State.STOPPING) {

            // second stop request; nudge the threads with interrupts

            getToePool().cleanup();

        }

        requestCrawlStop(CrawlStatus.ABORTED);

    }

/**

     * @return Active toe thread count.

     */

    public int getActiveToeCount() {

        if (toePool == null) {

            return 0;

        }

        return toePool.getActiveToeCount();

    }

    protected void setupToePool() {

        toePool = new ToePool(alertThreadGroup,this);

        // TODO: make # of toes self-optimizing

        toePool.setSize(getMaxToeThreads());

        toePool.waitForAll();

    }

    /**

     * @return The number of ToeThreads

     *

     * @see ToePool#getToeCount()

     */

    public int getToeCount() {

        return this.toePool == null? 0: this.toePool.getToeCount();

    }

    /**

     * @return The ToePool

     */

    public ToePool getToePool() {

        return toePool;

    }

    /**

     * Kills a thread. For details see

     * {@link org.archive.crawler.framework.ToePool#killThread(int, boolean)

     * ToePool.killThread(int, boolean)}.

     * @param threadNumber Thread to kill.

     * @param replace Should thread be replaced.

     * @see org.archive.crawler.framework.ToePool#killThread(int, boolean)

     */

    public void killThread(int threadNumber, boolean replace){

        toePool.killThread(threadNumber, replace);

    }

说得够清楚吧

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

本人邮箱：chenying998179@163#com （#改为.）

转载请注明出处博客园刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/p/3213556.html

Heritrix 3.1.0 源码解析（三十七）的更多相关文章

AFNetworking2.0源码解析<三>
本篇说说安全相关的AFSecurityPolicy模块,AFSecurityPolicy用于验证HTTPS请求的证书,先来看看HTTPS的原理和证书相关的几个问题. HTTPS HTTPS连接建立过程 ...
AFNetworking (3.1.0) 源码解析 <三>
今天要介绍的是Reachability文件夹下的AFNetworkReachabilityManager类.通过字面意思我们就可以知道AFNetworkReachabilityManager是用来监测 ...
solr&lucene3.6.0源码解析（三）
solr索引操作(包括新增更新删除提交合并等)相关UML图如下从上面的类图我们可以发现,其中体现了工厂方法模式及责任链模式的运用 UpdateRequestProcessor相当于责任链模式 ...
solr&lucene3.6.0源码解析（四）
本文要描述的是solr的查询插件,该查询插件目的用于生成Lucene的查询Query,类似于查询条件表达式,与solr查询插件相关UML类图如下: 如果我们强行将上面的类图纳入某种设计模式语言的话,本 ...
Celery 源码解析三： Task 对象的实现
Task 的实现在 Celery 中你会发现有两处,一处位于 celery/app/task.py,这是第一个:第二个位于 celery/task/base.py 中,这是第二个.他们之间是有关系的, ...
Android事件总线（二）EventBus3.0源码解析
1.构造函数当我们要调用EventBus的功能时,比如注册或者发送事件,总会调用EventBus.getDefault()来获取EventBus实例: public static EventBus ...
solr&lucene3.6.0源码解析（二）
上文描述了solr3.6.0怎么采用maven管理的方式在eclipse中搭建开发环境,在solr中,为了提高搜索性能,采用了缓存机制,这里描述的是LRU缓存,这里用到了 LinkedHashMap类 ...
solr&lucene3.6.0源码解析（一）
本文作为系列的第一篇,主要描述的是solr3.6.0开发环境的搭建首先我们需要从官方网站下载solr的相关文件,下载地址为http://archive.apache.org/dist/luc ...
apache mina2.0源码解析（一）
apache mina是一个基于java nio的网络通信框架,为TCP UDP ARP等协议提供了一致的编程模型:其源码结构展示了优秀的设计案例,可以为我们的编程事业提供参考. 依照惯例,首先搭建a ...

随机推荐

php socket编程参考资料
WebSocket API https://msdn.microsoft.com/library/hh673567 http://www.jnecw.com/p/1523 经朋友推荐去一家手游公司面试 ...
Oracle buffer cache与相关的latch等待事件
buffer cache与相关的latch等待事件 1.buffer cache 2.latch:cache buffers lru chain 3.latch:cache buffers chain ...
RabbitMQ链接不上异常
链接代码项目启动报的异常本地main方法链接报的异常网上查询原因问题说明及解决方案: 网上原因很多,最终原因都是连接不到数据库造成的. 1.查看防火墙 2.tomcat端口是否屏蔽 3.查看连 ...
jquery事件学习笔记（转载）
一.页面载入1.ready(fn)当DOM载入就绪可以查询及操纵时绑定一个要执行的函数.这是事件模块中最重要的一个函数,因为它可以极大地提高web应用程序的响应速度. 简单地说,这个方法纯粹是对向wi ...
Jquery Mobile设计Android通讯录第二章
本文是jQuery Mobile设计Android通讯录系统教程的第二篇,在上一篇教程中(http://publish.itpub.net/a2011/0517/1191/000001191561.s ...
textile
textile 编辑 Textile是一个人性化的Web文本生成器,以简洁的方式提供HTML标签功能. 目录 1内容 ▪ 短语修饰符 ▪ 块修饰符 ▪ 链接 ▪ 属性 ▪ 排列 ▪ 表格 ▪ 图像 ...
s:iterator标签的使用
1.在说明s:iterator标签的使用前,先了解下struts2中的Value Stack. 这里参考了webwork中对Value Stack的描述,由于struts2是在webwork的基础上进 ...
Linux下的Memcache安装
Linux下Memcache服务器端的安装服务器端主要是安装memcache服务器端,目前的最新版本是 memcached-1.3.0 .下载:http://www.danga.com/memcach ...
嵌入式使用mp4v2将H264+AAC合成mp4文件
录制程序要添加新功能:录制CMMB电视节目,我们的板卡发送出来的是RTP流(H264视频和AAC音频),录制程序要做的工作是: (1)接收并解析RTP包,分离出H264和AAC数据流: (2)将H26 ...
【转】linux之ln命令
转自:http://www.cnblogs.com/peida/archive/2012/12/11/2812294.html ln是linux中又一个非常重要命令,它的功能是为某一个文件在另外一个位 ...

Heritrix 3.1.0 源码解析（三十七）

Heritrix 3.1.0 源码解析（三十七）的更多相关文章

随机推荐

热门专题