1. 使用 fluent-bit 采集文件

简介

Fluent Bit是一款快速、灵活的日志处理器，旨在收集、解析、过滤日志，并将日志发送到远程数据库，以便执行数据分析。

数据分析通常发生在数据存储和数据库索引之后，但对于实时和复杂的分析需求，在日志处理器中处理仍在运行的数据会带来很多好处，这种方法被称为边缘流处理(Stream Processing on the Edge)。

流式处理引擎架构

fluent-bit之所以这么说，其实是因为其架构设计了一个流式的处理引擎：

默认实现组件

fluent-bit实现了不少默认的组件：

INPUT

collectd, cpu-metrics, disk-io-metrics, docker-events, docker-metrics, dummy, exec, fluentbit-metrics, forward, head, health, http, kernel-logs, memory-metrics, mqtt, network-io-metrics, nginx, node-exporter-metrics, process, random, serial-interface, standard-input, statsd, syslog, systemd, tail, tcp, thermal, windows-event-log, windows-event-log-winevtlog, windows-exporter-metrics,
OUTPUT

new-relic, forward, prometheus-remote-write, treasure-data, opensearch, skywalking, prometheus-exporter, azure, azure_blob, postgresql, bigquery, loki, elasticsearch, counter, null, cloudwatch, standard-output, syslog, websocket, flowcounter, logdna, firehose, influxdb, tcp-and-tls, kinesis, stackdriver, kafka-rest-proxy, opentelemetry, stackdriver_special_fields, slack, gelf, s3, datadog, splunk, file, http, kafka, nats
FILTERS

grep, aws-metadata, nightfall, lua, parser, type-converter, nest, record-modifier, standard-output, throttle, multiline-stacktrace, expect, tensorflow, geoip2-filter, modify, checklist, kubernetes, rewrite-tag
PARSERS

ltsv, configuring-parser, regular-expression, decoders, json, logfmt

该系列文章将介绍如何使用Fluent Bit，进行数据采集、处理、分发的过程。

本节将以采集文本文件入手，并结合监控方法来逐步为您展开介绍；

编译部署

前置条件

fluent-bit 采用c语言编写，可以通过容器或者二进制进行部署安装；其占用较少的CPU和内存资源，目前能够兼容绝大部分基于x86、x86_64、arm32v7和arm64v8的系统平台；

为了能够进行在本地进行测试编译，你需要如下依赖环境：

GCC or Clang
CMake
Flex & Bison: 仅当你需要流处理和记录访问器特性(这两种功能默认也是开启的)

编译

# clone 代码

git clone https://github.com/fluent/fluent-bit.git

# 切换到当前最新的一个发布分支

git checkout -b v1.9.1 v.9.1

# 编译

cd build

cmake ..

make

# 安装

# sudo make install 先不着急安装，我们来测试一下

测试

1. 先来创建一个测试路径：

mkdir ~/iSoft/fluent-bit -p

mkdir ~/isoft/fluent-bit/bin

mkdir ~/isoft/fluent-bit/conf

mkdir ~/isoft/fluent-bit/db

mkdir ~/isoft/fluent-bit/data

mkdir ~/isoft/fluent-bit/tmp

# 先将我们编译路径build下bin子路径内的东西全部复制过去

cp build/bin/* ~/iSoft/fluent-bit/bin/

2. 在conf目录下，创建三个配置文件：

fluent-bit.conf

[SERVICE]

    # Flush

    # =====

    # set an interval of seconds before to flush records to a destination

    flush        1

    # Daemon

    # ======

    # instruct Fluent Bit to run in foreground or background mode.

    daemon       Off

    # Log_Level

    # =========

    # Set the verbosity level of the service, values can be:

    #

    # - error

    # - warning

    # - info

    # - debug

    # - trace

    #

    # by default 'info' is set, that means it includes 'error' and 'warning'.

    log_level    info

    # Parsers File

    # ============

    # specify an optional 'Parsers' configuration file

    parsers_file parsers.conf

    # Plugins File

    # ============

    # specify an optional 'Plugins' configuration file to load external plugins.

    # plugins_file plugins.conf

    # HTTP Server

    # ===========

    # Enable/Disable the built-in HTTP Server for metrics

    http_server  On

    http_listen  0.0.0.0

    http_port    2020

    # Storage

    # =======

    # Fluent Bit can use memory and filesystem buffering based mechanisms

    #

    # - https://docs.fluentbit.io/manual/administration/buffering-and-storage

    #

    # storage metrics

    # ---------------

    # publish storage pipeline metrics in '/api/v1/storage'. The metrics are

    # exported only if the 'http_server' option is enabled.

    #

    #storage.metrics on

    # storage.path

    # ------------

    # absolute file system path to store filesystem data buffers (chunks).

    #

    # storage.path /tmp/storage

    # storage.sync

    # ------------

    # configure the synchronization mode used to store the data into the

    # filesystem. It can take the values normal or full.

    #

    # storage.sync normal

    # storage.checksum

    # ----------------

    # enable the data integrity check when writing and reading data from the

    # filesystem. The storage layer uses the CRC32 algorithm.

    #

    # storage.checksum off

    # storage.backlog.mem_limit

    # -------------------------

    # if storage.path is set, Fluent Bit will look for data chunks that were

    # not delivered and are still in the storage layer, these are called

    # backlog data. This option configure a hint of maximum value of memory

    # to use when processing these records.

    #

    # storage.backlog.mem_limit 5M

[INPUT]

    Name   tail

    Path   /home/etl/iSoft/fluent-bit/data/*.txt

    DB     /home/etl/iSoft/fluent-bit/db/tail.db

[OUTPUT]

    Name  stdout

    Match *

parser.conf

[PARSER]

    Name   json

    Format json

    Time_Key time

    Time_Format %d/%b/%Y:%H:%M:%S %z%

plugins.conf

这个文件其实暂时还用不上，是为我们自定义扩展插件的配置文件

[PLUGINS]

    # Path /path/to/out_gstdout.so

3. 启动fluent-bit

先进入到我们自己创建的tmp目录, 创建一些测试数据

cd ~/iSoft/fluent-bit/tmp

cat <<EOF > test-data.txt

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

EOF

根据上面fluent-bit.conf配置文件当中的INPUT/OUTPUT可以看出，我们利用了tail组件来监听data目录中的txt文件，并将结果输出到标准输出上来；

所以，启动程序

./fluent-bit -c ../conf/fluent-bit.conf

Fluent Bit v1.9.1

* Git commit: 619277847c6343dea9e4215deacd36cf61caf0a3

* Copyright (C) 2015-2021 The Fluent Bit Authors

* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd

* https://fluentbit.io

[2022/03/20 15:23:43] [ info] [engine] started (pid=15543)

[2022/03/20 15:23:43] [ info] [storage] version=1.1.6, initializing...

[2022/03/20 15:23:43] [ info] [storage] in-memory

[2022/03/20 15:23:43] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128

[2022/03/20 15:23:43] [ info] [cmetrics] version=0.3.0

[2022/03/20 15:23:43] [ info] [output:stdout:stdout.0] worker #0 started

[2022/03/20 15:23:43] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

[2022/03/20 15:23:43] [ info] [sp] stream processor started

ok, fluent-bit已经阻塞开始监听了

4. 开始测试

重新开启一个终端创口，进入到创建好的tmp目录，将测试数据手动批量的输出到data目录下：

cd ~/iSoft/fluent-bit/tmp

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

cat test-data.txt >> ../data/test.txt

这时可以看到fluent-bit阻塞的窗口已经在一直输出了

Fluent Bit v1.9.1

* Git commit: 619277847c6343dea9e4215deacd36cf61caf0a3

* Copyright (C) 2015-2021 The Fluent Bit Authors

* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd

* https://fluentbit.io

[2022/03/20 15:23:43] [ info] [engine] started (pid=15543)

[2022/03/20 15:23:43] [ info] [storage] version=1.1.6, initializing...

[2022/03/20 15:23:43] [ info] [storage] in-memory

[2022/03/20 15:23:43] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128

[2022/03/20 15:23:43] [ info] [cmetrics] version=0.3.0

[2022/03/20 15:23:43] [ info] [output:stdout:stdout.0] worker #0 started

[2022/03/20 15:23:43] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

[2022/03/20 15:23:43] [ info] [sp] stream processor started

[2022/03/20 15:23:43] [ info] [input:tail:tail.0] inotify_fs_add(): inode=11409520 watch_fd=1 name=/home/etl/iSoft/fluent-bit/data/test.txt

[0] tail.0: [1647761052.113552720, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[0] tail.0: [1647763863.114999757, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[1] tail.0: [1647763863.403883305, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[0] tail.0: [1647763893.545292283, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[0] tail.0: [1647763954.515264556, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[1] tail.0: [1647763954.515268143, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[2] tail.0: [1647763954.515268725, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[3] tail.0: [1647763954.515269171, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[4] tail.0: [1647763954.515269621, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[5] tail.0: [1647763954.515270065, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[6] tail.0: [1647763954.515270512, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[7] tail.0: [1647763954.515270965, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[8] tail.0: [1647763954.515271452, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

[9] tail.0: [1647763954.515276040, {"log"=>"{"log":"Done","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}"}]

小结

通过上面的配置文件fluent-bit.conf看到，我们为INPUT tail组件，还配置了一个DB参数，那是因为tail组件采用了嵌入式的sqlite3的数据库来记录监听文件的偏移量，我们可以去查看一下：

cd ~/iSoft/fluent-bit/db

sqlite3 tail.db

sqlite> .schema

CREATE TABLE in_tail_files (  id      INTEGER PRIMARY KEY,  name    TEXT NOT NULL,  offset  INTEGER,  inode   INTEGER,  created INTEGER,  rotated INTEGER DEFAULT 0);

sqlite>

sqlite> select * from in_tail_files ;

1|/home/etl/iSoft/fluent-bit/data/test.txt|77520|11409520|1647761023|0

sqlite>

通过字段名称可以看出来，分别是文件全路径名称、读取到的文件偏移量(字节数)、文件inode、文件创建时间、文件滚动标记

监控

根据fluent-bit官网的介绍，我们在上面的配置文档中，将fluent-bit进程的http_server设置为On，并且暴露在2020端口上，那么我们可以使用如下两个接口来查看其输出情况：

curl -s http://127.0.0.1:2020/api/v1/uptime | jq

{

  "uptime_sec": 391,

  "uptime_hr": "Fluent Bit has been running:  0 day, 0 hour, 6 minutes and 31 seconds"

}

curl -s http://127.0.0.1:2020/api/v1/metrics | jq

{

  "input": {

    "tail.0": {

      "records": 1,

      "bytes": 88,

      "files_opened": 1,

      "files_closed": 0,

      "files_rotated": 0

    }

  },

  "filter": {},

  "output": {

    "stdout.0": {

      "proc_records": 1,

      "proc_bytes": 88,

      "errors": 0,

      "retries": 0,

      "retries_failed": 0,

      "dropped_records": 0,

      "retried_records": 0

    }

  }

}

拉取promethues grafana镜像

fluent-bit为promethues提供了监控接口，那么我们来尝试一下：

在本地拉取promethues和grafana的镜像（这里就简单的将这两位泡在容器里）

docker pull prom/prometheus

docker pull grafana/grafana

配置promethues

mkdir ~/isoft/prometheus

vim ~/isoft/prometheus/fluent-bit-prom.yaml

这里为prometheus增加了一个job，因为是运行在docker里面，所以选择了另外一个宿主机IP, 以能否访问运行在宿主机上的fluent-bit

# my global config

global:

  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

  alertmanagers:

    - static_configs:

        - targets:

          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

  # - "first_rules.yml"

  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'

    # scheme defaults to 'http'.

    static_configs:

      - targets: ["localhost:9090"]

  - job_name: "fluent-bit"

    metrics_path: '/api/v1/metrics/prometheus' # metrics_path defaults to '/metrics'

    # scheme defaults to 'http'.

    static_configs:

      # 宿主机IP

      - targets: ["192.168.241.1:2020"]

启动

docker run -d -p 9090:9090 --name prom -v /home/etl/iSoft/prometheus/fluent-bit-prom.yaml:/etc/prometheus/prometheus.yml prom/prometheus

通过浏览器访问 http://localhost:9090/

配置grafana

mkdir ~/isoft/grafana/storage -p

chmod 777 ~/isoft/grafana/storage

启动

docker run -d -p 3000:3000 --name grafana -v /home/etl/iSoft/grafana/storage:/var/lib/grafana grafana/grafana

添加promethues数据源

导入fluent-bit提供的默认dashboard

fluent-bit-dashboard.json

查看监控界面

输入项

输出项

发现资源都叫 tail.0，所以根据文档，将配置文件INPUT/OUTPUT处增加两个别名，以区分可能监控到的多个配置

[INPUT]

   Name   tail

   Path   /home/etl/iSoft/fluent-bit/data/*.txt

   DB     /home/etl/iSoft/fluent-bit/db/tail.db

   Alias  monitor_txt_file

[OUTPUT]

   Name  stdout

   Match *

   Alias output_txt_file

再次查看监控界面

输入项

输出项

总结

至此，先完成初步测试，感兴趣的同学可以持续关注，后续会逐个介绍异构数据处理和发送（尽量多介绍相关组件和场景的使用）、如何使用pipline streaming的窗口SQL统计、为fluent-bit开发扩展组件等功能；

至于windows环境编译较为复杂（需要微软编译环境，我已测试过），感兴趣的同学，可以直接从github下载对应的win版本来进行测试；（我测试是OK的）