来自:http://caiguangguang.blog.51cto.com/1652935/1384187

flume bucketpath的bug一例

测试的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
agent-server1.sources= testtail
agent-server1.sinks = hdfs-sink
agent-server1.channels= hdfs-channel
agent-server1.sources.testtail.type = netcat
agent-server1.sources.testtail.bind = localhost
agent-server1.sources.testtail.port = 9999
agent-server1.sinks.hdfs-sink.hdfs.kerberosPrincipal = hdfs/_HOST@KERBEROS_HADOOP
agent-server1.sinks.hdfs-sink.hdfs.kerberosKeytab = /home/vipshop/conf/hdfs.keytab
agent-server1.channels.hdfs-channel.type = memory
agent-server1.channels.hdfs-channel.capacity = 200000000
agent-server1.channels.hdfs-channel.transactionCapacity = 10000
agent-server1.sinks.hdfs-sink.type = hdfs
agent-server1.sinks.hdfs-sink.hdfs.path = hdfs://bipcluster/tmp/flume/%Y%m%d
agent-server1.sinks.hdfs-sink.hdfs.rollInterval = 60
agent-server1.sinks.hdfs-sink.hdfs.rollSize = 0
agent-server1.sinks.hdfs-sink.hdfs.rollCount = 0
agent-server1.sinks.hdfs-sink.hdfs.threadsPoolSize = 10
agent-server1.sinks.hdfs-sink.hdfs.round = false
agent-server1.sinks.hdfs-sink.hdfs.roundValue = 30
agent-server1.sinks.hdfs-sink.hdfs.roundUnit = minute
agent-server1.sinks.hdfs-sink.hdfs.batchSize = 100
agent-server1.sinks.hdfs-sink.hdfs.fileType = DataStream
agent-server1.sinks.hdfs-sink.hdfs.writeFormat = Text
agent-server1.sinks.hdfs-sink.hdfs.callTimeout = 60000
agent-server1.sinks.hdfs-sink.hdfs.idleTimeout = 100
agent-server1.sinks.hdfs-sink.hdfs.filePrefix = ip
agent-server1.sinks.hdfs-sink.channel = hdfs-channel
agent-server1.sources.testtail.channels = hdfs-channel

在启动服务后,使用telnet进行测试,发现如下报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
14/03/24 18:03:07 ERROR hdfs.HDFSEventSink: process failed
java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing.
 Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NumberFormatException: null
        at java.lang.Long.parseLong(Long.java:375)
        at java.lang.Long.valueOf(Long.java:525)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
        ... 5 more
14/03/24 18:03:07 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to
resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        ... 3 more
Caused by: java.lang.NumberFormatException: null
        at java.lang.Long.parseLong(Long.java:375)
        at java.lang.Long.valueOf(Long.java:525)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
        ... 5 more

从调用栈的信息来看,错误出在org.apache.flume.formatter.output.BucketPath类的replaceShorthand方法。
在org.apache.flume.sink.hdfs.HDFSEventSink类中,使用process方法来生成hdfs的url,其中主要是调用了BucketPath类的escapeString方法来进行字符的转换,并最终调用了replaceShorthand方法。
其中replaceShorthand方法的相关代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  public static String replaceShorthand(char c, Map<String, String> headers,
      TimeZone timeZone, boolean needRounding, int unit, int roundDown) {
    String timestampHeader = headers.get("timestamp");
    long ts;
    try {
      ts = Long.valueOf(timestampHeader);
    catch (NumberFormatException e) {
      throw new RuntimeException("Flume wasn't able to parse timestamp header"
        " in the event to resolve time based bucketing. Please check that"
        " you're correctly populating timestamp header (for example using"
        " TimestampInterceptor source interceptor).", e);
    }
    if(needRounding){
      ts = roundDown(roundDown, unit, ts);
    }
........

从代码中可以看到,timestampHeader 的值如果取不到,在向ts赋值时就会报错。。
这其实是flume的一个bug,bug id:
https://issues.apache.org/jira/browse/FLUME-1419
解决方法有3个:
1.更改配置,更新hdfs文件的路径格式

1
agent-server1.sinks.hdfs-sink.hdfs.path = hdfs://bipcluster/tmp/flume

但是这样就不能按天来存放日志了
2.通过更改相关的代码
(patch:https://issues.apache.org/jira/secure/attachment/12538891/FLUME-1419.patch)
如果在headers中获取不到timestamp的值,就给它一个当前timestamp的值。
相关代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
     String timestampHeader = headers.get("timestamp");
     long ts;
     try {
      if (timestampHeader == null) {
        ts = System.currentTimeMillis();
      else {
        ts = Long.valueOf(timestampHeader);
      }
     } catch (NumberFormatException e) {
       throw new RuntimeException("Flume wasn't able to parse timestamp header"
         " in the event to resolve time based bucketing. Please check that"
         " you're correctly populating timestamp header (for example using"
                  " TimestampInterceptor source interceptor).", e);
}

3.为source定义基于timestamp的interceptors 
在配置中增加两行即可:

1
2
agent-server1.sources.testtail.interceptors = i1
agent-server1.sources.testtail.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

一个技巧:
在debug flume的问题时,可以在flume的启动参数中设置把debug日志打到console中。

1
-Dflume.root.logger=DEBUG,console,LOGFILE

Flume wasn't able to parse timestamp header的更多相关文章

  1. 关于flume中涉及到时间戳的错误解决,Expected timestamp in the Flume even

    在搭建flume集群收集日志写入hdfs时发生了下面的错误: java.lang.NullPointerException: Expected timestamp in the Flume event ...

  2. 当我new class的时候,提示以下错误: Unable to parse template "Class" Error message: This template did not produce a Java class or an interface Error parsing file template: Unable to find resource 'Package Header.j

    你肯定修改过class的template模板,改回去就好了 #if (${PACKAGE_NAME} && ${PACKAGE_NAME} != "")packag ...

  3. Flume架构

    Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统: Flume 介绍 Flume是由cloudera软件公司产出的高可用.高可靠.分布式的海量日志收集系 ...

  4. netty 网关 flume 提交数据 去除透明 批处理 批提交 cat head tail 结合 管道显示行号

    D:\javaNettyAction\NettyA\src\main\java\com\test\HexDumpProxy.java package com.test; import io.netty ...

  5. TimeStamp

    private void Form1_Load(object sender, EventArgs e) { textBox1.Text= GenerateTimeStamp(System.DateTi ...

  6. Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)(一)

    Flume 1.7.0 User Guide Introduction(简介) Overview(综述) System Requirements(系统需求) Architecture(架构) Data ...

  7. c# datetime与 timeStamp(unix时间戳) 互相转换

    /// <summary> /// Unix时间戳转为C#格式时间 /// </summary> /// <param name="timeStamp" ...

  8. webMagic解析淘宝cookie 提示Invalid cookie header

    webMagic解析淘宝cookie 提示Invalid cookie header 在使用webMagic框架做爬虫爬取淘宝极又家页面时候一直提醒cookie设置不可用如下图 淘宝的验证特别严重,c ...

  9. c# datetime与 timeStamp时间戳 互相转换

    将时间格式转化为一个int类型 // ::26时间转完后为:1389675686数字 为什么使用时间戳? 关于Unix时间戳,大概是这个意思,从1970年0时0分0秒开始到现在的秒数.使用它来获得的是 ...

随机推荐

  1. 工作记录(1)- js问题

    也是好久不写博客了,确实懒了:想想应该把node.js的东西写完整比较好,在抽时间吧: 这几天在做阿里巴巴的一个页面展示,里面设计到了一些js的问题,中途也遇到了一些幼稚的问题, 算是简单记录一下,以 ...

  2. C++ Primer 学习笔记_34_STL实践与分析(8) --引言、pair类型、关联容器

    STL实践与分析 --引言.pair类型.关联容器 引言:     关联容器与顺序容器的本质差别在于:关联容器通过键[key]来存储和读取元素,而顺序容器则通过元素在容器中的位置顺序的存取元素. ma ...

  3. mysql slock

    http://www.itdks.com/dakashuo/new/dakalive/detail/3888

  4. OpenCV 机器学习之 支持向量机的使用方法实例

    用支持向量机进行文理科生的分类,根据的特征主要是 数学成绩与语文成绩,这两个特征都服从高斯分布 程序代码例如以下: 分类结果:

  5. hibernate一级缓存,二级缓存和查询缓存

    一级缓存 (必然存在)  session里共享缓存,伴随session的生命周期存在和消亡:   1. load查询实体支持一级缓存 2. get查询实体对象也支持 3. save保存的实体对象会缓存 ...

  6. ool _WebTryThreadLock(bool),

    一般的问题是这样的 “bool _WebTryThreadLock(bool), 0xxxxxx: Tried to obtain the web lock from a thread other t ...

  7. ORACLE 修改已有存储过程(plsql工具修改)

    pl/sql 修改包下存储过程步骤: 假定有如下过程:pkg_ypgl_query.PROC_KCZQUERY; 1. pl/sql 右侧objects面板中选择Package bodies>P ...

  8. Java获取电脑IP、MAC、各种版本

    Java代码获取电脑IP.MAC.各种版本 package com.rapoo.middle.action; import java.io.BufferedReader; import java.io ...

  9. android之截屏(包括截取scrollview与listview的)

    public class ScreenShot { // 获取指定Activity的截屏,保存到png文件 public static Bitmap takeScreenShot(Activity a ...

  10. Hashing图像检索源码及数据库总结

    下面的这份哈希算法小结来源于本周的周报,原本并没有打算要贴出来的,不过,考虑到这些资源属于关注利用哈希算法进行大规模图像搜索的各位看官应该很有用,所以好东西本小子就不私藏了.本资源汇总最主要的收录原则 ...