[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题

现象：执行mapreduce任务时失败

通过hadoop fsck -openforwrite命令查看发现有文件没有关闭。

[root@com ~]# hadoop fsck -openforwrite /data/rc/click/mpp/15-08-05/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://com.hunantv.hadoopnamenode:50070
FSCK started by root (auth:SIMPLE) from /10.100.1.46 for path /data/rc/click/mpp/15-08-05/ at Thu Aug 06 14:05:03 CST 2015
....................................................................................................
....................................................................................................
........./data/rc/click/mpp/15-08-05/FlumeData.1438758322864 42888 bytes, 1 block(s), OPENFORWRITE:
/data/rc/click/mpp/15-08-05/FlumeData.1438758322864: Under replicated BP-1672356070-10.100.1.36-1412072991411:blk_1120646538_47162789{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-f4fff5f3-f3fd-4054-a75c-1d7da53a73af:NORMAL|FINALIZED], ReplicaUnderConstruction[[DISK]DS-26f54bc5-5026-4e6a-94ec-8435224e4aa9:NORMAL|RWR], ReplicaUnderConstruction[[DISK]DS-4ab3fffc-6468-47df-8023-79f23a330371:NORMAL|FINALIZED]]}. Target Replicas is 3 but found 2 replica(s).
..........................................................................................
............................Status: HEALTHY
Total size: 99186583 B
Total dirs: 1
Total files: 328
Total symlinks: 0
Total blocks (validated): 328 (avg. block size 302398 B)
Minimally replicated blocks: 328 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (0.30487806 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.996951
Corrupt blocks: 0
Missing replicas: 1 (0.101626016 %)
Number of data-nodes: 59
Number of racks: 6
FSCK ended at Thu Aug 06 14:05:03 CST 2015 in 36 milliseconds

The filesystem under path '/data/rc/click/mpp/15-08-05/' is HEALTHY

查看FLume日志

[root@10.100.1.117] out: 05 Aug 2015 11:15:19,322 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:234) - Creating hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,493 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:429) - Closing idle bucketWriter hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp at 1438744580493
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,497 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:363) - Closing hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,501 WARN [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:370) - failed to close() HDFSWriter for file (hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp). Exception follows.
[root@10.100.1.117] out: java.io.IOException: Callable timed out after 10000 ms on file: hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,503 INFO [hdfs-sin_hdfs_201-call-runner-7] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) - Renaming hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp to hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293

关闭hdfs文件操作因为超时失败，

查看源码：

 public synchronized void close(boolean callCloseCallback)

    throws IOException, InterruptedException {

    checkAndThrowInterruptedException();

    try {

      flush();

    } catch (IOException e) {

      LOG.warn("pre-close flush failed", e);

    }

    boolean failedToClose = false;

    LOG.info("Closing {}", bucketPath);

    CallRunner<Void> closeCallRunner = createCloseCallRunner();

    if (isOpen) {

      try {

        callWithTimeout(closeCallRunner);

        sinkCounter.incrementConnectionClosedCount();

      } catch (IOException e) {

        LOG.warn(

          "failed to close() HDFSWriter for file (" + bucketPath +

            "). Exception follows.", e);

        sinkCounter.incrementConnectionFailedCount();

        failedToClose = true;

      }

      isOpen = false;

    } else {

      LOG.info("HDFSWriter is already closed: {}", bucketPath);

    }

    // NOTE: timed rolls go through this codepath as well as other roll types

    if (timedRollFuture != null && !timedRollFuture.isDone()) {

      timedRollFuture.cancel(false); // do not cancel myself if running!

      timedRollFuture = null;

    }

    if (idleFuture != null && !idleFuture.isDone()) {

      idleFuture.cancel(false); // do not cancel myself if running!

      idleFuture = null;

    }

    if (bucketPath != null && fileSystem != null) {

      // could block or throw IOException

      try {

        renameBucket(bucketPath, targetPath, fileSystem);

      } catch(Exception e) {

        LOG.warn(

          "failed to rename() file (" + bucketPath +

          "). Exception follows.", e);

        sinkCounter.incrementConnectionFailedCount();

        final Callable<Void> scheduledRename =

                createScheduledRenameCallable();

        timedRollerPool.schedule(scheduledRename, retryInterval,

                TimeUnit.SECONDS);

      }

    }

    if (callCloseCallback) {

      runCloseAction();

      closed = true;

    }

  }

默认超时为10000ms，失败后没有重试，代码中有 failedToClose变量，但未用到，可能开发人员忘了处理了。。。

解决方法：

1. 配置调用操作超时时间，将其调大一点，如5分钟。Flume hdfs sink配置如下：

agent12.sinks.sin_hdfs_201.type=hdfs

agent12.sinks.sin_hdfs_201.channel=ch_hdfs_201

agent12.sinks.sin_hdfs_201.hdfs.path=hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-%{month}-%{day}

agent12.sinks.sin_hdfs_201.hdfs.round=true

agent12.sinks.sin_hdfs_201.hdfs.roundValue=10

agent12.sinks.sin_hdfs_201.hdfs.roundUnit=minute

agent12.sinks.sin_hdfs_201.hdfs.fileType=DataStream

agent12.sinks.sin_hdfs_201.hdfs.writeFormat=Text

agent12.sinks.sin_hdfs_201.hdfs.rollInterval=0

agent12.sinks.sin_hdfs_201.hdfs.rollSize=209715200

agent12.sinks.sin_hdfs_201.hdfs.rollCount=0

agent12.sinks.sin_hdfs_201.hdfs.idleTimeout=300

agent12.sinks.sin_hdfs_201.hdfs.batchSize=100

agent12.sinks.sin_hdfs_201.hdfs.minBlockReplicas=1

agent12.sinks.sin_hdfs_201.hdfs.callTimeout=300000

2. 修改源码，增加重试。如下：

 public synchronized void close(boolean callCloseCallback)

            throws IOException, InterruptedException {

        checkAndThrowInterruptedException();

        try {

            flush();

        } catch (IOException e) {

            LOG.warn("pre-close flush failed", e);

        }

        boolean failedToClose = false;

        LOG.info("Closing {}", bucketPath);

        CallRunner<Void> closeCallRunner = createCloseCallRunner();

        int tryTime = 1;

        while (isOpen && tryTime <= 5) {

            try {

                callWithTimeout(closeCallRunner);

                sinkCounter.incrementConnectionClosedCount();

            } catch (IOException e) {

                LOG.warn(

                        "failed to close() HDFSWriter for file (try times:" + tryTime + "): " + bucketPath +

                                ". Exception follows.", e);

                sinkCounter.incrementConnectionFailedCount();

                failedToClose = true;

            }

            if (failedToClose) {

                isOpen = true;

                tryTime++;

                Thread.sleep(this.callTimeout);

            } else {

                isOpen = false;

            }

        }

        //如果isopen失敗

        if (isOpen) {

            LOG.error("failed to close file: " + bucketPath + " after " + tryTime + " tries.");

        } else {

            LOG.info("HDFSWriter is already closed: {}", bucketPath);

        }

        // NOTE: timed rolls go through this codepath as well as other roll types

        if (timedRollFuture != null && !timedRollFuture.isDone())

        {

            timedRollFuture.cancel(false); // do not cancel myself if running!

            timedRollFuture = null;

        }

        if (idleFuture != null && !idleFuture.isDone())

        {

            idleFuture.cancel(false); // do not cancel myself if running!

            idleFuture = null;

        }

        if (bucketPath != null && fileSystem != null) {

            // could block or throw IOException

            try {

                renameBucket(bucketPath, targetPath, fileSystem);

            } catch (Exception e) {

                LOG.warn(

                        "failed to rename() file (" + bucketPath +

                                "). Exception follows.", e);

                sinkCounter.incrementConnectionFailedCount();

                final Callable<Void> scheduledRename =

                        createScheduledRenameCallable();

                timedRollerPool.schedule(scheduledRename, retryInterval,

                        TimeUnit.SECONDS);

            }

        }

        if (callCloseCallback)

        {

            runCloseAction();

            closed = true;

        }

    }

[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题的更多相关文章

HDFS Sink使用技巧
1.文件滚动策略在HDFS Sink的文件滚动就是文件生成,即关闭当前文件,创建新文件.它的滚动策略由以下几个属性控制: hdfs.rollInterval 基于时间间隔来进行文件滚动,默认是30, ...
Flume采集处理日志文件
Flume简介 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据 ...
flume中sink到hdfs，文件系统频繁产生文件，文件滚动配置不起作用？
在测试hdfs的sink,发现sink端的文件滚动配置项起不到任何作用,配置如下: a1.sinks.k1.type=hdfs a1.sinks.k1.channel=c1 a1.sinks.k1.h ...
flume中sink到hdfs，文件系统频繁产生文件和出现乱码，文件滚动配置不起作用？
问题描述解决办法先把这个hdfs目录下的数据删除.并修改配置文件flume-conf.properties,重新采集. # Licensed to the Apache Software Fou ...
Flume中的HDFS Sink配置参数说明【转】
转:http://lxw1234.com/archives/2015/10/527.htm 关键字:flume.hdfs.sink.配置参数 Flume中的HDFS Sink应该是非常常用的,其中的配 ...
Flume启动报错[ERROR - org.apache.flume.sink.hdfs. Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication解决办法（图文详解）
前期博客 Flume自定义拦截器(Interceptors)或自带拦截器时的一些经验技巧总结(图文详解) 问题详情 -- ::, (SinkRunner-PollingRunner-Default ...
shell脚本监控Flume输出到HDFS上文件合法性
在使用flume中发现由于网络.HDFS等其它原因,使得经过Flume收集到HDFS上得日志有一些异常,表现为: 1.有未关闭的文件:以tmp(默认)结尾的文件.加入存到HDFS上得文件应该是gz压缩 ...
[ETL] Flume 理论与demo（Taildir Source & Hdfs Sink）
一.Flume简介 1. Flume概述 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据: ...
Flume采集目录及文件到HDFS案例
采集目录到HDFS 使用flume采集目录需要启动hdfs集群 vi spool-hdfs.conf # Name the components on this agent a1.sources = ...

随机推荐

MySQL初始化的正确姿势
1. 背景 mysql安装教程很多,但是有不少讲得过于简单,没有考虑到安全问题.比如说,一些教程里,只设置一个root用户,并且对外网公开,一来容易被破解密码(用户名固定,破解难度自然降了一大截,而且 ...
log4j配置详解
Log4J的配置文件(Configuration File)就是用来设置记录器的级别.存放器和布局的,它可接key=value格式的设置或xml格式的设置信息.通过配置,可以创建出Log4J的运行环境 ...
Java程序性能优化——让你的java程序更快、更稳定
1.Java性能调优概述 1.1.Web服务器,响应时间.吞吐量是两个重要的性能参数. 1.2.程序性能的几个表现: 执行速度:程序的反映是否迅速,响应时间是否足够短内存分配:分配是否合理,是否过多 ...
hadoop io PART1
数据正确性检测的技术,通常使用checksum,在数据进行传输前,计算一个checksum值,传输到目标地之后,再根据新的文件计算checksum值,如果不匹配,则说明数据损坏或被改变.只能校验,不提 ...
架构实例之Demo_JSP_JavaBean_Servlet
架构实例之Demo_JSP_JavaBean_Servlet 1.开发工具和开发环境开发工具: MyEclipse10,JDK1.6.0_13(32位),Tomcat7.0(32位),m ...
Java：类与继承
Java:类与继承对于面向对象的程序设计语言来说,类毫无疑问是其最重要的基础.抽象.封装.继承.多态这四大特性都离不开类,只有存在类,才能体现面向对象编程的特点,今天我们就来了解一些类与继承的相关知 ...
python3使用pyqt5制作一个超简单浏览器
我们使用的是QWebview模块,这里也主要是展示下QWebview的用法. 之前在网上找了半天的解析网页的内容,都不是很清楚. 这是核心代码: webview = Qwebview() webvie ...
POJ2406Power Strings[KMP 失配函数]
Power Strings Time Limit: 3000MS Memory Limit: 65536K Total Submissions: 45005 Accepted: 18792 D ...
第16章 List集合的总结和遍历
第16章 List集合的总结和遍历 1.重构设计根据Vector类,ArrayList类,和LinkedList类所具有的存储特点以及拥有的方法入手,发现共性往上抽取. 共同特点: 1.允许元素重复 ...
angular学习笔记(二十九)-$q服务
angular中的$q是用来处理异步的(主要当然是http交互啦~). $q采用的是promise式的异步编程.什么是promise异步编程呢? 异步编程最重要的核心就是回调,因为有回调函数,所以才构 ...

[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题

[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题的更多相关文章

随机推荐

热门专题