转载自：https://mp.weixin.qq.com/s?__biz=MzU4MjQ0MTU4Ng==&mid=2247500439&idx=1&sn=45e9e0e0ef4e41ed52d9b1bf81d2879d&chksm=fdbacd8acacd449c3ea56432a1e89e48441482905687c020c59af7bcf64e4edfbb8bebf945b6&cur_album_id=1837018771551485956&scene=190#rd

日志收集的时候多行日志一直是一个比较头疼的问题，开发人员并不愿意将日志以 JSON 的方式进行输出，那么就只能在收集日志的时候去重新对日志做下结构化了。

由于日志采集器的实现方式和标准不一样，所以具体如何处理多行日志不同的采集器也会不一样的，比如这里我们使用 Fluentd 来作为日志采集器，那么我们就可以使用 multiline 这个解析器来处理多行日志。

多行解析器使用 formatN 和 format_firstline 参数解析日志，format_firstline 用于检测多行日志的起始行。formatN，其中 N 的范围是 [1..20]，是多行日志的 Regexp 格式列表。

测试数据

比如现在我们有如下所示的多行日志数据：

2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [   scheduling-4] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.

2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [   scheduling-4] io.lettuce.core.RedisChannelHandler      : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]

2022-06-20 17:28:27.871 DEBUG 6 [TID:N/A] --- [           main] o.h.l.p.build.spi.LoadPlanTreePrinter    : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)

    - Returns

       - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)

    - QuerySpaces

       - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)

          - SQL table alias mapping - scheduledl0_

          - alias suffix - 0_

          - suffixed key columns - {id1_51_0_}

2022-06-20 19:32:47.062 DEBUG 7 [TID:N/A] --- [nection-cleaner] h.i.c.PoolingHttpClientConnectionManager : Closing connections idle longer than 60000 MILLISECONDS

首先创建一个 fluentd 目录，在下面创建用于保存 fluentd 的配置文件 etc 目录和保存日志的 logs 目录，将上面的测试日志保存在 logs/test.log 文件中

$ mkdir fluentd

$ cd fluentd

# 创建用于保存 fluentd 的配置文件 etc 目录和保存日志的 logs 目录

$ mkdir -p etc logs

常规解析

然后创建一个用于解析日志的 fluentd 配置文件 etcd/fluentd_basic.conf，内容如下所示：

<source>

  @type tail

  path /fluentd/logs/*.log

  pos_file /fluentd/logs/test.log.pos

  tag test.logs

  read_from_head true

  <parse>

    @type regexp

    expression /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/

  </parse>

</source>

<match **>

  @type stdout

</match>

然后我们使用 docker 镜像的方式来启动 fluentd 解析我们的日志：

$ docker run --rm -v $(pwd)/etc:/fluentd/etc -v $(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_basic.conf -v

fluentd -c /fluentd/etc/fluentd_basic.conf -v

2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_basic.conf"

2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3'

2022-06-20 12:31:17 +0000 [warn]: fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead

2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: using configuration file: <ROOT>

  <source>

    @type tail

    path "/fluentd/logs/*.log"

    pos_file "/fluentd/logs/test.log.pos"

    tag "test.logs"

    read_from_head true

    <parse>

      @type "regexp"

      expression /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/

      unmatched_lines

    </parse>

  </source>

  <match **>

    @type stdout

  </match>

</ROOT>

2022-06-20 12:36:21 +0000 [info]: fluent/log.rb:330:info: starting fluentd-1.14.3 pid=10 ruby="2.7.5"

2022-06-20 12:36:21 +0000 [info]: fluent/log.rb:330:info: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluentd_basic.conf", "-v", "--plugin", "/fluentd/plugins", "--under-supervisor"]

2022-06-20 12:36:22 +0000 [info]: fluent/log.rb:330:info: adding match pattern="**" type="stdout"

2022-06-20 12:36:22 +0000 [info]: fluent/log.rb:330:info: adding source type="tail"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead

2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: starting fluentd worker pid=19 ppid=10 worker=0

2022-06-20 12:36:22 +0000 [debug]: #0 fluent/log.rb:309:debug: tailing paths: target = /fluentd/logs/test.log | existing =

2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: following tail of /fluentd/logs/test.log

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "    - Returns"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "       - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "    - QuerySpaces"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "       - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "          - SQL table alias mapping - scheduledl0_"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "          - alias suffix - 0_"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: "          - suffixed key columns - {id1_51_0_}"

2022-06-20 12:36:22 +0000 [warn]: #0 fluent/log.rb:351:warn: pattern not matched: ""

2022-06-20 12:36:22.308970489 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":"   scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection."}

2022-06-20 12:36:22.309013403 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":"   scheduling-4","message":"io.lettuce.core.RedisChannelHandler      : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]"}

2022-06-20 12:36:22.309025559 +0000 test.logs: {"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":"           main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter    : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)"}

2022-06-20 12:36:22.309715537 +0000 test.logs: {"timestamp":"2022-06-20 19:32:47.062","level":"DEBUG","pid":"7","tid":"N/A","thread":"nection-cleaner","message":"h.i.c.PoolingHttpClientConnectionManager : Closing connections idle longer than 60000 MILLISECONDS"}

2022-06-20 12:36:22 +0000 [info]: #0 fluent/log.rb:330:info: fluentd worker is now running worker=0

2022-06-20 12:36:22.305753588 +0000 fluent.info: {"pid":19,"ppid":10,"worker":0,"message":"starting fluentd worker pid=19 ppid=10 worker=0"}

2022-06-20 12:36:22.308522121 +0000 fluent.debug: {"message":"tailing paths: target = /fluentd/logs/test.log | existing = "}

2022-06-20 12:36:22.308751095 +0000 fluent.info: {"message":"following tail of /fluentd/logs/test.log"}

2022-06-20 12:36:22.309047520 +0000 fluent.warn: {"message":"pattern not matched: \"    - Returns\""}

2022-06-20 12:36:22.309180634 +0000 fluent.warn: {"message":"pattern not matched: \"       - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\""}

2022-06-20 12:36:22.309258667 +0000 fluent.warn: {"message":"pattern not matched: \"    - QuerySpaces\""}

2022-06-20 12:36:22.309328608 +0000 fluent.warn: {"message":"pattern not matched: \"       - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\""}

2022-06-20 12:36:22.309401309 +0000 fluent.warn: {"message":"pattern not matched: \"          - SQL table alias mapping - scheduledl0_\""}

2022-06-20 12:36:22.309468557 +0000 fluent.warn: {"message":"pattern not matched: \"          - alias suffix - 0_\""}

2022-06-20 12:36:22.309563730 +0000 fluent.warn: {"message":"pattern not matched: \"          - suffixed key columns - {id1_51_0_}\""}

2022-06-20 12:36:22.309723704 +0000 fluent.warn: {"message":"pattern not matched: \"\""}

2022-06-20 12:36:22.310086626 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}

从上面的解析结果可以看出，正则表达式有一部分没匹配，有一些可以正常解析，比如下面的日志就是前面的一行日志解析出来后的结果：

{"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":"   scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection."}

而没有正常匹配的是多行日志，fluentd 会将每一个日志行当成独立的一行进行处理，这显然不符合我们的预期。

多行解析器

我们希望的是能将多行日志当成一行日志进行处理，这里就需要用到 multiline 这个解析器了，新建一个用于多行日志处理的配置文件 etc/fluentd_multline.conf，内容如下所示：

<source>

  @type tail

  path /fluentd/logs/*.log

  pos_file /fluentd/logs/test.log.pos

  tag test.logs

  read_from_head true

  <parse>

    @type multiline

    format_firstline /\d{4}-\d{1,2}-\d{1,2}/

    format1 /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/

  </parse>

</source>

<match **>

  @type stdout

</match>

这里面我们使用 format_firstline /\d{4}-\d{1,2}-\d{1,2}/ 来匹配每一行日志的开头，format1 用来解析第一行日志，如果你还有更多数据需要匹配，则可以继续配置第二行 format2 的匹配规则等等，使用上面这个配置重新启动 fluentd：

docker run --rm -v $(pwd)/etc:/fluentd/etc -v $(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_multline.conf -v

fluentd -c /fluentd/etc/fluentd_multline.conf -v

2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_multline.conf"

2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3'

2022-06-20 12:41:58 +0000 [warn]: fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead

2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: using configuration file: <ROOT>

  <source>

    @type tail

    path "/fluentd/logs/*.log"

    pos_file "/fluentd/logs/test.log.pos"

    tag "test.logs"

    read_from_head true

    <parse>

      @type "multiline"

      format_firstline "/\\d{4}-\\d{1,2}-\\d{1,2}/"

      format1 /^(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+) (?<pid>[^s+]+) \[TID:(?<tid>[,a-z0-9A-Z./]+)\] --- \[(?<thread>.*)\] (?<message>[\s\S]*)/

      unmatched_lines

    </parse>

  </source>

  <match **>

    @type stdout

  </match>

</ROOT>

2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: starting fluentd-1.14.3 pid=9 ruby="2.7.5"

2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluentd_multline.conf", "-v", "--plugin", "/fluentd/plugins", "--under-supervisor"]

2022-06-20 12:41:59 +0000 [info]: fluent/log.rb:330:info: adding match pattern="**" type="stdout"

2022-06-20 12:41:59 +0000 [info]: fluent/log.rb:330:info: adding source type="tail"

2022-06-20 12:41:59 +0000 [warn]: #0 fluent/log.rb:351:warn: define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead

2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: starting fluentd worker pid=18 ppid=9 worker=0

2022-06-20 12:41:59 +0000 [debug]: #0 fluent/log.rb:309:debug: tailing paths: target = /fluentd/logs/test.log | existing =

2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: following tail of /fluentd/logs/test.log

2022-06-20 12:41:59.201105512 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":"   scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection."}

2022-06-20 12:41:59.201140475 +0000 test.logs: {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":"   scheduling-4","message":"io.lettuce.core.RedisChannelHandler      : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command]"}

2022-06-20 12:41:59.201213082 +0000 test.logs: {"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":"           main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter    : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)\n    - Returns\n       - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\n    - QuerySpaces\n       - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\n          - SQL table alias mapping - scheduledl0_\n          - alias suffix - 0_\n          - suffixed key columns - {id1_51_0_}"}

2022-06-20 12:41:59 +0000 [info]: #0 fluent/log.rb:330:info: fluentd worker is now running worker=0

2022-06-20 12:41:59.199950788 +0000 fluent.info: {"pid":18,"ppid":9,"worker":0,"message":"starting fluentd worker pid=18 ppid=9 worker=0"}

2022-06-20 12:41:59.200662918 +0000 fluent.debug: {"message":"tailing paths: target = /fluentd/logs/test.log | existing = "}

2022-06-20 12:41:59.200844577 +0000 fluent.info: {"message":"following tail of /fluentd/logs/test.log"}

2022-06-20 12:41:59.201480874 +0000 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}

可以看到现在获取到的日志就正常了，前面的多行日志也按我们的预期解析成一行日志了：

{"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":"           main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter    : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)\n    - Returns\n       - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=<gen:0>, path=com.xxxx.entity.ScheduledLeadsInvalid)\n    - QuerySpaces\n       - EntityQuerySpaceImpl(uid=<gen:0>, entity=com.xxxx.entity.ScheduledLeadsInvalid)\n          - SQL table alias mapping - scheduledl0_\n          - alias suffix - 0_\n          - suffixed key columns - {id1_51_0_}"}

当然这整个过程并不复杂，唯一麻烦的地方需要我们去「编写正则表达式」去匹配日志，这可能才是难倒大部分人的一个问题吧

Fluentd 使用 multiline 解析器来处理多行日志的更多相关文章

Boost学习之语法解析器--Spirit
Boost.Spirit能使我们轻松地编写出一个简单脚本的语法解析器,它巧妙利用了元编程并重载了大量的C++操作符使得我们能够在C++里直接使用类似EBNF的语法构造出一个完整的语法解析器(同时也把C ...
高性能JSON解析器及生成器RapidJSON
RapidJSON是腾讯公司开源的一个C++的高性能的JSON解析器及生成器,同时支持SAX/DOM风格的API. 直击现场 RapidJSON是腾讯公司开源的一个C++的高性能的JSON解析器及生成 ...
c# 怎样能写个sql的解析器
c# 怎样能写个sql的解析器本示例主要是讲明sql解析的原理,真实的源代码下查看 sql解析器源代码详细示例DEMO 请查看demo代码前言阅读本文需要有一定正则表达式基础正则表达式基础教 ...
XML技术之DOM4J解析器
由于DOM技术的解析,存在很多缺陷,比如内存溢出,解析速度慢等问题,所以就出现了DOM4J解析技术,DOM4J技术的出现大大改进了DOM解析技术的缺陷. 使用DOM4J技术解析XML文件的步骤? pu ...
AFN解析器里的坑
AFN框架是用来用来发送网络请求的,它的好处是可以自动给你解析JSON数据,还可以发送带参数的请求AFN框架还可以监测当前的网络状态,还支持HTTPS请求,分别对用的类为AFNetworkReacha ...
SpringMVC视图解析器
SpringMVC视图解析器前言在前一篇博客中讲了SpringMVC的Controller控制器,在这篇博客中将接着介绍一下SpringMVC视图解析器.当我们对SpringMVC控制的资源发起 ...
XML技术之SAX解析器
1.解析XML文件有三种解析方法:DOM SAX DOM4J. 2.首先SAX解析技术只能读取XML文档中的数据信息,不能对其文档中的数据进行添加,删除,修改操作:这就是SAX解析技术的一个缺陷. 3 ...
学习SpringMVC——说说视图解析器
各位前排的,后排的,都不要走,咱趁热打铁,就这一股劲我们今天来说说spring mvc的视图解析器(不要抢,都有位子~~~) 相信大家在昨天那篇如何获取请求参数篇中都已经领略到了spring mvc注 ...
SpringMVC入门案例及请求流程图（关于处理器或视图解析器或处理器映射器等的初步配置）
SpringMVC简介:SpringMVC也叫Spring Web mvc,属于表现层的框架.Spring MVC是Spring框架的一部分,是在Spring3.0后发布的 Spring结构图 Spr ...

随机推荐

Assembly.GetManifestResourceStream为null
想把某个项目的某个文件夹里面的ini文件生成的时候顺便生成为网站和服务文件夹项目 string _path = Path.Combine(AppDomain.CurrentDomain.BaseDir ...
Mvcapi解决H5请求接口跨域问题
using Newtonsoft.Json;using System;using System.Collections.Generic;using System.Linq;using System.N ...
算法竞赛进阶指南0x41并查集
并查集简介并查集的两类操作: Get 查询任意一个元素是属于哪一个集合. Merge 把两个集合合并在一起. 基本思想:找到代表元. 注意有两种方法: 使用一个固定的值(查询方便,但是在合并的时候需 ...
【机器学习基础】——另一个视角解释SVM
SVM的另一种解释前面已经较为详细地对SVM进行了推导,前面有提到SVM可以利用梯度下降来进行求解,但并未进行详细的解释,本节主要从另一个视角对SVM进行解释,首先先回顾之前有关SVM的有关内容,然 ...
java------JRE和JDK
JDK(Java Development kit):Java开发工具包包括 JVM(Java Virtual Machine):java虚拟机,真正运行java程序的地方(Java语言在运行时并不是 ...
mysql5.7通过文件zip方式安装-九五小庞
为什么通过zip的方式进行安装电脑上已安装过mysql数据库,想要再安装一个. 1.下载mysql安装包直接找到mysql官网,在官网上下载zip安装包. https://downloads.my ...
自定义注解_格式&本质和自定义注解_属性定义
自定义注解: 格式: public @interface 注解名称{} 本质:注解本质上就是一个接口,该接口默认继承Annotation接口 public interface MyAnno exten ...
3.26省选模拟+NOI-ONLINE
今日趣闻: 这三个人都是同机房的,卡最优解(大常数选手不参与)....以至于最优解第一页都是我们机房的(有图为证,共三人) $NOI\ online$ $T1$ 首先模拟一遍记录这个点当前单调栈前面位 ...
论语音社交视频直播平台与 Apache DolphinScheduler 的适配度有多高
在 Apache DolphinScheduler& Apache ShenYu(Incubating) Meetup 上,YY 直播软件工程师袁丙泽为我们分享了<YY直播基于Ap ...
喜讯：“行走的文档” 当选 Apache DolphinScheduler Committer啦
点击上方蓝字关注 Apache DolphinScheduler Apache DolphinScheduler(incubating),简称"DS", 中文名 "海豚调 ...

Fluentd 使用 multiline 解析器来处理多行日志

测试数据

常规解析

多行解析器

Fluentd 使用 multiline 解析器来处理多行日志的更多相关文章

随机推荐

热门专题