nginx+ flume

nginx 作用：做负载均衡
nginx和lvs的区别：nginx可以做反向代理
1、上传nginx安装包 tar -zxvf tengine-2.1.0
2、安装环境
依赖 gcc openssl-devel pcre-devel zlib-devel
安装：yum install gcc openssl-devel pcre-devel zlib-devel -y
3、安装Nginx
./configure
make && make install
4、在/etc/rc.d/init.d 目录下注册nginx文件

vi nginx

#!/bin/sh

#

# nginx - this script starts and stops the nginx daemon

#

# chkconfig:   - 85 15

# description:  Nginx is an HTTP(S) server, HTTP(S) reverse \

#               proxy and IMAP/POP3 proxy server

# processname: nginx

# config:      /etc/nginx/nginx.conf

# config:      /etc/sysconfig/nginx

# pidfile:     /var/run/nginx.pid

# Source function library.

. /etc/rc.d/init.d/functions

# Source networking configuration.

. /etc/sysconfig/network

# Check that networking is up.

[ "$NETWORKING" = "no" ] && exit 0

nginx="/usr/local/nginx/sbin/nginx"

prog=$(basename $nginx)

NGINX_CONF_FILE="/usr/local/nginx/conf/nginx.conf"

[ -f /etc/sysconfig/nginx ] && . /etc/sysconfig/nginx

lockfile=/var/lock/subsys/nginx

make_dirs() {

   # make required directories

   user=`nginx -V 2>&1 | grep "configure arguments:" | sed 's/[^*]*--user=\([^ ]*\).*/\1/g' -`

   options=`$nginx -V 2>&1 | grep 'configure arguments:'`

   for opt in $options; do

       if [ `echo $opt | grep '.*-temp-path'` ]; then

           value=`echo $opt | cut -d "=" -f 2`

           if [ ! -d "$value" ]; then

               # echo "creating" $value

               mkdir -p $value && chown -R $user $value

           fi

       fi

   done

}

start() {

    [ -x $nginx ] || exit 5

    [ -f $NGINX_CONF_FILE ] || exit 6

    make_dirs

    echo -n $"Starting $prog: "

    daemon $nginx -c $NGINX_CONF_FILE

    retval=$?

    echo

    [ $retval -eq 0 ] && touch $lockfile

    return $retval

}

stop() {

    echo -n $"Stopping $prog: "

    killproc $prog -QUIT

    retval=$?

    echo

    [ $retval -eq 0 ] && rm -f $lockfile

    return $retval

}

restart() {

    configtest || return $?

    stop

    sleep 1

    start

}

reload() {

    configtest || return $?

    echo -n $"Reloading $prog: "

    killproc $nginx -HUP

    RETVAL=$?

    echo

}

force_reload() {

    restart

}

configtest() {

  $nginx -t -c $NGINX_CONF_FILE

}

rh_status() {

    status $prog

}

rh_status_q() {

    rh_status >/dev/null 2>&1

}

case "$1" in

    start)

        rh_status_q && exit 0

        $1

        ;;

    stop)

        rh_status_q || exit 0

        $1

        ;;

    restart|configtest)

        $1

        ;;

    reload)

        rh_status_q || exit 7

        $1

        ;;

    force-reload)

        force_reload

        ;;

    status)

        rh_status

        ;;

    condrestart|try-restart)

        rh_status_q || exit 0

            ;;

    *)

        echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload|configtest}"

        exit 2

esac

5、给该文件一个执行权限 chmod +x nginx
6、添加该文件到系统服务中去
   chkconfig --add nginx
   查看是否添加成功
   chkconfig --list nginx
7、nginx启动 service nginx start
8、启动之后用浏览器访问   查看是否启动成功 node2:
9、修改/usr/local/nginx/conf/nginx.conf
a) 、修改格式化方式
log_format my_format '$remote_addr^A$msec^A$http_host^A$request_uri';

location =/log.gif {
        default_type image/gif;
        access_log /opt/data/access.log my_format;
10、修改之后需要重新登录 service nginx reload

flume 知识点总结
一、安装flume
1、上传flume安装包
      解压安装包： tar -zxvf apache-flume-1.6.0
2、修改解压包的名称： mv apache-flume-1.6.0-bin flume
3、修改配置文件夹的名称： mv flume-env.sh.template flume-env.sh
4、在配置文件夹 flume-env.sh 文件夹中配置flume java 环境变量
（注：在冒号模式下寻找java配置文件的位置： /JAVA,寻找环境变量中对java环境变量的配置位置 echo $JAVA_HOME）
5、在环境变量配置文件夹中配置java环境变量
vi /etc/profile   添加FLUME_HOME 的配置
FLUME_HOME= /root/flume
将配置文件添加到path路径下面   . /etc/profile
6、配置完成之后查看文件配置是否成功
flume-ng version 如果能够显示flume的版本说明文件配置成功了
7、添加自定义配置文件

案例1、 A simple example
http://flume.apache.org/FlumeUserGuide.html#a-simple-example

配置文件

    ############################################################

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = netcat

    a1.sources.r1.bind = node2

    a1.sources.r1.port = 44444

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

启动flume
flume-ng agent -n a1 -c conf -f option -Dflume.root.logger=INFO,console
注意：启动命令启动的位置一定要在option文件所在的文件夹下

安装telnet
yum install telnet
退出 ctrl+] quit

Memory Chanel 配置
capacity：默认该通道中最大的可以存储的event数量是100，
trasactionCapacity：每次最大可以source中拿到或者送到sink中的event数量也是100
keep-alive：event添加到通道中或者移出的允许时间
byte**：即event的字节量的限制，只包括eventbody

-----配置多节点的flume
1、将node2配置的配置文件发送到node3
scp -r flume/ root@node3:/root/
2、配置node3节点的环境变量
vi /etc/profile

案例2、两个flume做集群

#node2

    ############################################################

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = netcat

    a1.sources.r1.bind = node2

    a1.sources.r1.port = 44444

    # Describe the sink

    # a1.sinks.k1.type = logger

    a1.sinks.k1.type = avro

    a1.sinks.k1.hostname = node3

    a1.sinks.k1.port = 60000

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

node02服务器中，安装Flume（步骤略）
配置文件

############################################################

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = avro

    a1.sources.r1.bind = node3

    a1.sources.r1.port = 60000

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

先启动node02的Flume
flume-ng agent -n a1 -c conf -f avro.conf -Dflume.root.logger=INFO,console

flume-ng agent --conf -file option2 --name a1 -Dflume.root.logger=INFO,console

   再启动node01的Flume
   flume-ng agent -n a1 -c conf -f simple.conf2 -Dflume.root.logger=INFO,console

   打开telnet 测试 node02控制台输出结果

node3显示这些的时候证明连接成功了

注意事项：在配置的时候，需要注意节点之间的名称，同时需要注意启动顺序先启动客户端node3 在启动服务端 node2

------将多个flume上的日志内容收集到一个服务器上解决单点故障问题
flume可以进行断点续传

--案例三：execu source --执行源通过一个unix命令监控数据源

Exec Source

        http://flume.apache.org/FlumeUserGuide.html#exec-source

    配置文件

    ############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = exec

    a1.sources.r1.command = tail -F  /root/dirflume/log.txt

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

    启动Flume

    flume-ng agent -n a1 -c conf -f exec.conf -Dflume.root.logger=INFO,console

    创建空文件演示 touch flume.exec.log

    循环添加数据

    for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done

--案例四：读取具有指定格式的文件夹

Spooling Directory Source

        http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

    配置文件

    ############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = spooldir

    a1.sources.r1.spoolDir = /root/flume/log/

    a1.sources.r1.fileHeader = false

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

启动Flume
flume-ng agent -n a1 -c conf -f spool.conf -Dflume.root.logger=INFO,console

拷贝文件演示
mkdir logs
cp flume.exec.log logs/

原始目录中有文件会进行读取，原始目录中没有文件，添加文件后也会进行读取
添加后缀名之后再进行校验： a1.sources.r1.fileSuffix=.wcg

--注：断电续传的功能是需要去进行配置的
---案例五：将flume中的数据导入到hdfs中
hdfs sink
       http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

       配置文件

############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = spooldir

    a1.sources.r1.spoolDir = /home/logs

    a1.sources.r1.fileHeader = true

    # Describe the sink

    ***只修改上一个spool sink的配置代码块 a1.sinks.k1.type = logger

    a1.sinks.k1.type=hdfs

    a1.sinks.k1.hdfs.path=hdfs://bjsxt/flume/%Y-%m-%d/%H%M

    ##每隔60s或者文件大小超过10M的时候产生新文件

    # hdfs有多少条消息时新建文件，0不基于消息个数

    a1.sinks.k1.hdfs.rollCount=0

    # hdfs创建多长时间新建文件，0不基于时间

    a1.sinks.k1.hdfs.rollInterval=60

    # hdfs多大时新建文件，0不基于文件大小

    a1.sinks.k1.hdfs.rollSize=10240

    # 当目前被打开的临时文件在该参数指定的时间（秒）内，没有任何数据写入，则将该临时文件关闭并重命名成目标文件

    a1.sinks.k1.hdfs.idleTimeout=3

    a1.sinks.k1.hdfs.fileType=DataStream

    a1.sinks.k1.hdfs.useLocalTimeStamp=true

    ## 每五分钟生成一个目录:

    # 是否启用时间上的”舍弃”，这里的”舍弃”，类似于”四舍五入”，后面再介绍。如果启用，则会影响除了%t的其他所有时间表达式

    a1.sinks.k1.hdfs.round=true

    # 时间上进行“舍弃”的值；

    a1.sinks.k1.hdfs.roundValue=5

    # 时间上进行”舍弃”的单位，包含：second,minute,hour

    a1.sinks.k1.hdfs.roundUnit=minute

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity = 1000

    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

注：flume是通过hdfs的环境变量，默认找到hdfs的配置位置

----用flume获取nginx的日志，并上传到hdfs

# project

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /opt/data/access.log

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /log/%Y%m%d

a1.sinks.k1.hdfs.filePrefix = log-

a1.sinks.k1.hdfs.rollInterval=0

a1.sinks.k1.hdfs.rollSize=102400

a1.sinks.k1.hdfs.rollCount=0

a1.sinks.k1.hdfs.idleTimeout=10

a1.sinks.k1.hdfs.callTimeOut=40000

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sinks.k1.hdfs.fileType=DataStream

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

nginx+ flume的更多相关文章

Nginx+Flume+Hadoop日志分析，Ngram+AutoComplete
配置Nginx yum install nginx (在host99和host101) service nginx start开启服务 ps -ef |grep nginx看一下进程 ps -ef | ...
Flume初始
一.Flume是什么 Flume是一个数据,日志收集的一个组件,可以用于对程序,nginx等日志的收集,而且非常简单,省时的做完收集的工作.Flume是一个分布式.可靠.和高可用的海量日志采集聚合和传 ...
SparkStreaming实时日志分析--实时热搜词
Overview 整个项目的整体架构如下: 关于SparkStreaming的部分: Flume传数据到SparkStreaming:为了简单使用的是push-based的方式.这种方式可能会丢失数据 ...
大数据离线分析平台 JSSDK数据收集引擎编写
JsSDK设计规则在js sdk中我们需要收集launch.pageview.chargeRequest和eventDuration四种数据,所以我们需要在js中写入四个方法来分别收集这些数据,另外我 ...
大数据离线分析平台 JavaSDK数据收集引擎编写
JavaSDK设计规则 JavaSDK提供两个事件触发方法,分别为onChargeSuccess和onChargeRefund.我们在java sdk中通过一个单独的线程来发送线程数据,这样可以减少对 ...
SLA 99.99%以上！饿了么实时计算平台3年演进历程
作者介绍倪增光,饿了么BDI-大数据平台研发高级技术经理,曾先后就职于PPTV.唯品会.15年加入饿了么,组建数据架构team,整体负责离线平台.实时平台.平台工具的开发和运维,先后经历了唯品会.饿 ...
Flume采集Nginx日志到HDFS
下载apache-flume-1.7.0-bin.tar.gz,用 tar -zxvf 解压,在/etc/profile文件中增加设置: export FLUME_HOME=/opt/apache-f ...
配置好Nginx后，通过flume收集日志到hdfs（记得生成本地log时，不要生成一个文件，）
生成本地log最好生成多个文件放在一个文件夹里,特别多的时候一个小时一个文件配置好Nginx后,通过flume收集日志到hdfs 可参考flume的文件用flume的案例二执行的注意点 avro ...
将nginx搜集到的日志通过flume转到hive
背景介绍: Nginx为app打点数据,打点日志每小时滚动一次.目录结构如下文件中的数据如下( cat -A 2019072414r.log 后的结果,-A为显示隐形的符号,下方^A为指定的分隔符. ...

随机推荐

h5互动课件动画如何实现？如何快速开发h5互动课件动画
最近几年随着h5的兴起,复杂的h5动画,甚至是交互动画类型的产品不断涌现,尤其在课件产品方面,很多公司都有相关需求,最近很多h5开发工程师想了解相关方面的技术. 针对h5,如果是简单的动画效果,可以考 ...
记录一次mongodb因网络问题导致shard节点异常
现象: 机房反馈9点左右,机房交换机故障,导致网络出现问题业务人员反馈某个接口超时初查:通过业务日志查看分析发现,在连接mongo的某个collections时候,报错错误如下: 在写入数据的时候 ...
ARM 汇编学习笔记
Bacnet协议IP采集开发总结
一.开发准备 a.模拟器 VTS和BACnetDeviceSimulator b.主站 BACnetScan c.参考文档 http://wenku.baidu.com/view/3052 ...
Python【每日一问】10
问:请解释一下迭代器答:可以被 __next__() 函数调用并不断返回下一个值的对象称为迭代器:Iterator
vue 路由参数
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
序列化模块_pickle
序列化: 把不能够直接存储的数据变成字节流(bytes)保存在文件, 进行持久化存储反序列化: 任何数据都可以转成字节流(bytes)进行存储: 1. dumps 把任意对象序列化 li = [1, ...
Linux裸设备管理详解--
裸设备概述裸设备:也叫裸分区(原始分区),是一种没有经过格式化,不被Unix/Linux通过文件系统来读取的特殊字符设备.裸设备可以绑定一个分区,也可以绑定一个磁盘.字符设备:对字符设备的读写不需要 ...
var let const
你真的永远都不用var了吗? javascript的一些争论已经浮现出了一些经典的案例,因此,es6的拥护者你们应该讲var遗忘吗?这篇博客将带你走进被遗忘的角落首先举例反对者的几个观点: 1.如果 ...
docker安装elasticsearch及head插件
使用 Docker 拉取ElasticSearch镜像 docker pull elasticsearch:7.0.0 查看镜像 ID docker images 运行 docker run -e E ...

nginx+ flume

nginx+ flume的更多相关文章

随机推荐

热门专题