前言

自己最近在负责elk的工作,在这里想写一个总结,把好多遇到的问题啥的,都写一下,也做个笔记,

目录

环境介绍

kafka,zookeeper安装

logstash安装

elasticsearch安装

lucene语法

kafka使用

elasticsearch插件安装

elasticsearch常用操作

环境介绍

这次部署ELK还有filebeat都是5.6.3版本,整体数据流是filebeat logstash kafka logstash elasticsearch grafana(kibana),

  1. rpm -qf /etc/issue
  2. centos-release-7-3.1611.el7.centos.x86_64
  3. zookeeper-3.4.10
  4. kafka_2.11-0.10.0.1

kafka,zookeeper安装

安装有些部分和我的ansible笔记重了,我就不一一列举了,大家可以多看看,我都是用supervisor管理的kafka和zookeeper的启停,下面是supervisor的配置文件。

  1. [root@jumpserver common]# cat templates/kafka.ini
  2. [program:kafka]
  3. command = /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
  4. autorestart=true
  5. redirect_stderr = true
  6. stdout_logfile = /opt/kafka/supervisor_logs/kafka.log
  7. stopasgroup=true
  8. environment = JAVA_HOME=/opt/java
  9. [root@jumpserver common]# cat templates/zookeeper.ini
  10. [program:zookeeper]
  11. command = /opt/zookeeper/bin/zkServer.sh start-foreground
  12. autorestart=true
  13. redirect_stderr = true
  14. stdout_logfile = /opt/zookeeper/supervisor_logs/zookeeper.log
  15. stopasgroup=true
  16. environment = JAVA_HOME=/opt/java

logstash安装

需要把java放到/usr/bin/下,要不就要修改logstash的配置参数。

  1. ll /usr/bin/java
  2. lrwxrwxrwx 1 root root 18 Sep 6 17:05 /usr/bin/java /opt/java/bin/java

否则会有如下报错:

  1. Using provided startup.options file: /etc/logstash/startup.options
  2. /usr/share/logstash/vendor/jruby/bin/jruby: line 388: /usr/bin/java: No such file or directory
  3. Unable to install system startup script for Logstash.

elasticsearch安装

最好按照官方文档进行一些设置,如关闭swap,jvm内存设置不要超过内存的一半并且不要超过32G,关闭这些设置可以参考这篇文档,下面把一些文件给大家看看:

  1. cat templates/elasticsearch.yml |egrep -v "^#|^$"
  2. cluster.name: moji
  3. node.name: {{ ansible_hostname }}
  4. path.data: /opt/elasticsearch/data
  5. path.logs: /opt/elasticsearch/logs
  6. bootstrap.memory_lock: true
  7. network.host: 0.0.0.0
  8. discovery.zen.ping.unicast.hosts: [{{ cluster_list|map('regex_replace', '^(.*)$', '"\\1"')|join(',') }}]
  9. http.cors.enabled: true
  10. http.cors.allow-origin: "*"
  11. cat templates/elasticsearch.service |egrep -v "^#|^$"
  12. [Unit]
  13. Description=Elasticsearch
  14. Documentation=http://www.elastic.co
  15. Wants=network-online.target
  16. After=network-online.target
  17. [Service]
  18. Environment=ES_HOME=/usr/share/elasticsearch
  19. Environment=CONF_DIR=/etc/elasticsearch
  20. Environment=DATA_DIR=/var/lib/elasticsearch
  21. Environment=LOG_DIR=/var/log/elasticsearch
  22. Environment=PID_DIR=/var/run/elasticsearch
  23. EnvironmentFile=-/etc/sysconfig/elasticsearch
  24. WorkingDirectory=/usr/share/elasticsearch
  25. User=elasticsearch
  26. Group=elasticsearch
  27. ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec
  28. ExecStart=/usr/share/elasticsearch/bin/elasticsearch \
  29. -p ${PID_DIR}/elasticsearch.pid \
  30. --quiet \
  31. -Edefault.path.logs=${LOG_DIR} \
  32. -Edefault.path.data=${DATA_DIR} \
  33. -Edefault.path.conf=${CONF_DIR}
  34. StandardOutput=journal
  35. StandardError=inherit
  36. LimitNOFILE=65536
  37. LimitNPROC=2048
  38. LimitMEMLOCK=infinity
  39. TimeoutStopSec=0
  40. KillSignal=SIGTERM
  41. KillMode=process
  42. SendSIGKILL=no
  43. SuccessExitStatus=143
  44. [Install]
  45. WantedBy=multi-user.target

lucene语法

下面是写在grafana中过滤特定条件的

  1. domain:$domain AND NOT http_code:499
  2. domain:$domain AND http_code:499

kafka使用

现在logstash使用new consumer来管理详细的介绍为:https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/

查看消费情况(默认是所有消费者)

  1. cd /opt/kafka/bin
  2. ./kafka-consumer-groups.sh --new-consumer --group logstash --bootstrap-server 172.16.21.7:9096 --describe
  3. old version check status:
  4. cd /opt/kafka/bin
  5. ./kafka-topics.sh list zookeeper /ops|while read topic; do ./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker group logstash topic $topic zookeeper /ops; done

删除一个topic

  1. ./kafka-topics.sh --delete --zookeeper / --topic nginx-log-wis

elasticsearch插件安装

目前用了两个插件headHQ,这两个插件从git clone下来后用nginx启动就行了,下面是我用来启动AdminLTE的例子可做参考,就是改下root,然后写个server的标签就行了

  1. cat /etc/nginx/conf.d/lte.conf
  2. server {
  3. listen 2000 ;
  4. root /application/AdminLTE;
  5. location / {
  6. }
  7. }

elasticsearch常用操作

删除一段时间的数据,size默认是10,最大的是一个es的设置,我们一般先query看一下size如果比较大的话,如果不大,直接设置个比较大的size就行了。我这个index的date格式的name是timestamp。

  1. post
  2. http://192.168.3.3:9200/daily-summary-statistics-http-code/_delete_by_query
  3. {
  4. "size": 100000,
  5. "query": {
  6. "bool": {
  7. "filter": [
  8. {
  9. "range": {
  10. "timestamp": {
  11. "gte": "2018-01-09T00:00:00.000+08:00",
  12. "lt": "2018-01-10T00:00:00.000+08:00"
  13. }
  14. }
  15. }
  16. ]
  17. }
  18. }
  19. }

修改template数值

  1. get http://192.168.3.3:9200/_template/logstash

然后找到logstash字段,都copy出来,把想改的改了,在put回去,我是改了refresh_interval

  1. put http://192.168.3.3:9200/_template/logstash
  2. {
  3. "order": 0,
  4. "version": 50001,
  5. "template": "logstash-*",
  6. "settings": {
  7. "index": {
  8. "refresh_interval": "30s"
  9. }
  10. },
  11. "mappings": {
  12. "_default_": {
  13. "_all": {
  14. "enabled": true,
  15. "norms": false
  16. },
  17. "dynamic_templates": [
  18. {
  19. "message_field": {
  20. "path_match": "message",
  21. "match_mapping_type": "string",
  22. "mapping": {
  23. "type": "text",
  24. "norms": false
  25. }
  26. }
  27. },
  28. {
  29. "string_fields": {
  30. "match": "*",
  31. "match_mapping_type": "string",
  32. "mapping": {
  33. "type": "text",
  34. "norms": false,
  35. "fields": {
  36. "keyword": {
  37. "type": "keyword",
  38. "ignore_above": 256
  39. }
  40. }
  41. }
  42. }
  43. }
  44. ],
  45. "properties": {
  46. "@timestamp": {
  47. "type": "date",
  48. "include_in_all": false
  49. },
  50. "@version": {
  51. "type": "keyword",
  52. "include_in_all": false
  53. },
  54. "geoip": {
  55. "dynamic": true,
  56. "properties": {
  57. "ip": {
  58. "type": "ip"
  59. },
  60. "location": {
  61. "type": "geo_point"
  62. },
  63. "latitude": {
  64. "type": "half_float"
  65. },
  66. "longitude": {
  67. "type": "half_float"
  68. }
  69. }
  70. }
  71. }
  72. }
  73. },
  74. "aliases": {}
  75. }

elasticsearch数据处理

我们用es收集完数据之后,我们需要提取一些我们想要的数据做永久存储,

我们现在只收集了nginx的log,格式如下,还没仔细整理呢:

  1. log_format access_log_json '{"remote_addr":"$remote_addr","host":"$host","time_iso8601":"$time_iso8601","request":"$request","status":"$status","body_bytes_sent":"$body_bytes_sent","http_referer":"$http_referer","http_user_agent":"$http_user_agent","http_x_forwarded_for":"$http_x_forwarded_for","upstream_response_time":"$upstream_response_time","uri":"$uri","request_time":"$request_time"}';

下面的是python脚本,以下好多脚本会用到,我们的index名称规则是logstash-nginx-log-appname然后加时间,

  1. from datetime import timedelta
  2. import datetime
  3. import requests
  4. G_URL = "http://192.168.3.3:9200"
  5. headers = {"content-type": "application/json"}
  6. ES_DATA_KEEP_DAYS = 7
  7. time_map = {
  8. "mappings" : {
  9. "doc" : {
  10. "properties" : {
  11. "timestamp" : { "type" : "date" }
  12. }
  13. }
  14. }
  15. }
  16. def get_apps():
  17. res = requests.get(
  18. G_URL + "/_cat/indices?v",
  19. json={}
  20. ).text
  21. apps = set()
  22. for line in res.strip().split("\n"):
  23. lines = line.split()
  24. if lines[2].startswith("logstash-nginx-log"):
  25. index_name = lines[2]
  26. app_name = index_name.split("-")[-2] if index_name.split("-")[-2] != "log" else "whv3"
  27. apps.add(app_name)
  28. return list(apps)
  29. def get_iso_day(days_before_now=14):
  30. now = datetime.datetime.now()
  31. return (now - timedelta(days=days_before_now)).strftime('%Y-%m-%dT00:00:00.000+08:00')
  32. def get_one_day_qps_json_no_domain(days_before_now=14):
  33. return {
  34. "size": 0,
  35. "query": {
  36. "bool": {
  37. "filter": [
  38. {
  39. "range": {
  40. "time_iso8601": {
  41. "gte": get_iso_day(days_before_now),
  42. "lte": get_iso_day(days_before_now-1)
  43. }
  44. }
  45. }
  46. ]
  47. }
  48. },
  49. "aggs": {
  50. "count_per_interval": {
  51. "date_histogram": {
  52. "interval": "1s",
  53. "field": "time_iso8601",
  54. "min_doc_count": 0,
  55. "time_zone": "Asia/Shanghai"
  56. },
  57. "aggs": {
  58. "count_max": {
  59. "sum": {
  60. "script": {
  61. "source": "1"
  62. }
  63. }
  64. }
  65. }
  66. },
  67. "max_count": {
  68. "max_bucket": {
  69. "buckets_path": "count_per_interval>count_max"
  70. }
  71. }
  72. }
  73. }
  74. def get_one_day_http_code(days_before_now=14):
  75. return {
  76. "size": 0,
  77. "query": {
  78. "bool": {
  79. "filter": [
  80. {
  81. "range": {
  82. "time_iso8601": {
  83. "gte": get_iso_day(days_before_now),
  84. "lt": get_iso_day(days_before_now-1)
  85. }
  86. }
  87. }
  88. ]
  89. }
  90. },
  91. "aggs": {
  92. "4": {
  93. "terms": {
  94. "field": "host.keyword",
  95. "size": 10,
  96. "order": {
  97. "_count": "desc"
  98. },
  99. "min_doc_count": 1
  100. },
  101. "aggs": {
  102. "5": {
  103. "terms": {
  104. "field": "status",
  105. "size": 10,
  106. "order": {
  107. "_count": "desc"
  108. },
  109. "min_doc_count": 1
  110. },
  111. "aggs": {
  112. "2": {
  113. "date_histogram": {
  114. "interval": "1d",
  115. "field": "time_iso8601",
  116. "min_doc_count": 0,
  117. "time_zone": "Asia/Shanghai"
  118. },
  119. "aggs": { }
  120. }
  121. }
  122. }
  123. }
  124. }
  125. }
  126. }

计算每天最大QPS并且不区分域名:

  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. # Copyright (c) 2017 - hongzhi.wang <hongzhi.wang@moji.com>
  4. '''
  5. Author: hongzhi.wang
  6. Create Date: 2017/12/14
  7. Modify Date: 2017/12/14
  8. '''
  9. import sys
  10. import os
  11. file_root = os.path.dirname(os.path.abspath("__file__"))
  12. sys.path.append(file_root)
  13. import requests
  14. import json
  15. from elasticsearch import Elasticsearch
  16. from elasticsearch.helpers import bulk
  17. import elasticsearch
  18. from datetime import timedelta
  19. import time
  20. import datetime
  21. from settings import *
  22. daily_index_name = "daily-summary-statistics-qps-no-domain"
  23. if len(sys.argv) == 2:
  24. compute_days = int(sys.argv[1])
  25. else:
  26. compute_days = 1
  27. es = Elasticsearch(G_URL)
  28. def get_apps():
  29. res = requests.get(
  30. G_URL + "/_cat/indices?v",
  31. json={}
  32. ).text
  33. apps = set()
  34. for line in res.strip().split("\n"):
  35. lines = line.split()
  36. if lines[2].startswith("logstash") and 'soa' not in lines[2]:
  37. index_name = lines[2]
  38. app_name = index_name.split("-")[-2] if index_name.split("-")[-2] != "log" else "whv3"
  39. apps.add(app_name)
  40. return list(apps)
  41. apps = get_apps()
  42. import aiohttp
  43. import asyncio
  44. import async_timeout
  45. bodys = []
  46. async def fetch(session, app_name, days):
  47. index_pattern = "logstash-nginx-log-%s*" % app_name if app_name != "whv3" else "logstash-nginx-log-20*"
  48. if not es.indices.exists(daily_index_name):
  49. es.indices.create(index=daily_index_name, body=time_map)
  50. async with session.post(G_URL + "/%s/_search" % index_pattern, json=get_one_day_qps_json_no_domain(days), headers=headers) as response:
  51. es_result = json.loads(await response.text())
  52. try:
  53. item1 = es_result["aggregations"]
  54. if item1["max_count"]["value"] > 20:
  55. max_qps = item1["max_count"]["value"]
  56. max_qps_time = item1["max_count"]["keys"][0]
  57. bodys.append({
  58. "_index": daily_index_name,
  59. "_type": app_name,
  60. "_id": "%s-%s" % (app_name, max_qps_time),
  61. "_source": {
  62. "timestamp": max_qps_time,
  63. "max_qps": max_qps,
  64. }
  65. })
  66. except Exception as e:
  67. print(G_URL + "/%s/_search" % index_pattern)
  68. print(get_one_day_qps_json_no_domain(days))
  69. print(app_name)
  70. print(e)
  71. async def main(app_name, days=compute_days):
  72. async with aiohttp.ClientSession() as session:
  73. await fetch(session, app_name, days=days)
  74. loop = asyncio.get_event_loop()
  75. tasks = [main(app_name) for app_name in apps]
  76. loop.run_until_complete(asyncio.wait(tasks))
  77. res = bulk(es, bodys)
  78. print(datetime.datetime.now())
  79. print(res)

统计每个域名下的httpcode占得百分比

  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. # Copyright (c) 2017 - hongzhi.wang <hongzhi.wang@moji.com>
  4. '''
  5. Author: hongzhi.wang
  6. Create Date: 2017/12/5
  7. Modify Date: 2017/12/5
  8. '''
  9. import sys
  10. import os
  11. file_root = os.path.dirname(os.path.abspath("__file__"))
  12. sys.path.append(file_root)
  13. import requests
  14. import json
  15. from elasticsearch import Elasticsearch
  16. from elasticsearch.helpers import bulk
  17. import elasticsearch
  18. from datetime import timedelta
  19. import time
  20. import datetime
  21. from settings import *
  22. if len(sys.argv) == 2:
  23. day_to_process = int(sys.argv[1])
  24. else:
  25. day_to_process = 1
  26. headers = {"content-type": "application/json"}
  27. daily_index_name = "daily-summary-statistics-http-code"
  28. daily_percentage_index_name = "daily-summary-statistics-percentage-http-code"
  29. es = Elasticsearch(G_URL)
  30. apps = get_apps()
  31. def get_detail(app_name):
  32. index_pattern = "logstash-nginx-log-%s*" % app_name if app_name != "whv3" else "logstash-nginx-log-20*"
  33. print(index_pattern)
  34. if not es.indices.exists(daily_index_name):
  35. es.indices.create(index=daily_index_name, body=time_map)
  36. if not es.indices.exists(daily_percentage_index_name):
  37. es.indices.create(index=daily_percentage_index_name, body=time_map)
  38. res = requests.get(G_URL + "/%s/_search" % index_pattern, json=get_one_day_http_code(day_to_process), headers=headers)
  39. es_result = json.loads(res.text)
  40. for item in es_result["aggregations"]["4"]["buckets"]:
  41. domain = item["key"]
  42. all_sum = item["doc_count"]
  43. domain_dict = {}
  44. for detail in item["5"]["buckets"]:
  45. http_code = detail["key"]
  46. for final_detail in detail["2"]["buckets"]:
  47. domain_dict[http_code] = final_detail["doc_count"]
  48. yield {
  49. "_index": daily_index_name,
  50. "_type": app_name,
  51. "_id": "%s-%s-%d-%s" %(app_name, domain, http_code, get_iso_day(day_to_process)),
  52. "_source": {
  53. "timestamp": get_iso_day(day_to_process),
  54. "domain": domain,
  55. "http_code": http_code,
  56. "count": final_detail["doc_count"],
  57. }
  58. }
  59. count200 = domain_dict.get(200, 0)
  60. yield {
  61. "_index": daily_percentage_index_name,
  62. "_type": app_name,
  63. "_id": "%s-%s-%s" % (app_name, domain, get_iso_day(day_to_process)),
  64. "_source": {
  65. "timestamp": get_iso_day(day_to_process),
  66. "domain": domain,
  67. "percent200": count200/all_sum,
  68. }
  69. }
  70. for i in range(len(apps)):
  71. print(datetime.datetime.now())
  72. print("current process is %f%%" % (i/len(apps)*100))
  73. app_name = apps[i]
  74. res = bulk(es, get_detail(app_name=app_name))
  75. print(res)

ELK部署与使用总结的更多相关文章

  1. 分布式实时日志分析解决方案ELK部署架构

    一.概述 ELK 已经成为目前最流行的集中式日志解决方案,它主要是由Beats.Logstash.Elasticsearch.Kibana等组件组成,来共同完成实时日志的收集,存储,展示等一站式的解决 ...

  2. ELK 部署

    文章转载: http://www.open-open.com/doc/view/df156a76a824402482d1d72cd3b61e38 http://www.open-open.com/li ...

  3. Filebeat+ELK部署文档

    在日常运维工作中,对于系统和业务日志的处理尤为重要.今天,在这里分享一下自己部署的Filebeat+ELK开源实时日志分析平台的记录过程,有不对的地方还望指出. 简单介绍: 日志主要包括系统日志.应用 ...

  4. ELKStack入门篇(一)之ELK部署和使用

    一.ELKStack简介 1.ELK介绍 中文指南:https://www.gitbook.com/book/chenryn/elk-stack-guide-cn/details ELK Stack包 ...

  5. centos7环境下ELK部署之elasticsearch

    es部署:es只能用普通用户启动 博客园首发,转载请注明出处:https://www.cnblogs.com/tzxxh/p/9435318.html 一.环境准备: 安装jdk1.8.创建普通用户 ...

  6. filebeat + ELK 部署篇

    ELK Stack Elasticsearch:分布式搜索和分析引擎,具有高可伸缩.高可靠和易管理等特点.基于 Apache Lucene 构建,能对大容量的数据进行接近实时的存储.搜索和分析操作.通 ...

  7. ELK部署配置使用记录

    为什么要用ELK: 一般我们需要进行日志分析场景:直接在日志文件中 grep.awk 就可以获得自己想要的信息.但在规模较大的场景中,此方法效率低下,面临问题包括日志量太大如何归档.文本搜索太慢怎么办 ...

  8. ELK 部署文档

    1. 前言 在日常运维工作中,对于系统和业务日志的处理尤为重要.尤其是分布式架构,每个服务都会有很多节点,如果要手工一个一个的去取日志,运维怕是要累死. 简单介绍: ELK 是 elasticsear ...

  9. 基于 Ansible 的 ELK 部署说明

    ELK-Ansible使用手册 ELK-Ansible 是基于 Ansible 的 Playbooks 研发的 ELK集群部署工具.本文将介绍如何使用 ELK-Ansible 快速部署 ELK 集群. ...

随机推荐

  1. [python]Generators

    generators(生成器)是python提供的一种机制,可以让函数一边循环一边计算,通常函数是一遍执行,而生成器可以在执行中间交出变量,下次调用时从交出变量的地方重新开始,这种机制通过yield关 ...

  2. 用ActiveX 创建自己的comboBox 控件(一)

    新建ActiveX工程ActiveXcomboBox        Ok->next->next->next, create control based on 选择combobox, ...

  3. MySQL innodb_autoinc_lock_mode 详解

    innodb_autoinc_lock_mode这个参数控制着在向有auto_increment 列的表插入数据时,相关锁的行为: 通过对它的设置可以达到性能与安全(主从的数据一致性)的平衡 [0]我 ...

  4. docker 入门第一步

    docker 安装 利用yum 安装 yum 源更新到最新版本,命令: yum update 需要安装工具 net-tools 命令:yum  install -y net-tools 配置docke ...

  5. 关于$.ajax同步和异步的问题和提交后台的一些问题。

    经常有人ajax函数外,定义一个全局变量,并且在返回函数取出一个值用作判断条件,但是这一条件常常失效. var OnOff=0; var checkPhone = function() { var p ...

  6. html css col-md-offset

    有的时候,我们不想让两个相邻的列挨在一起,这时候利用栅格系统的列偏移(offset)功能来实现,而不必再定义margin值.使用.col-md-offset-*形式的样式就可以将列偏移到右侧.例如,. ...

  7. iOS之Safari调试webView/H5页面

    之前做过混合开发,用的是JavaScriptCore+OC+UIWebView. Safari调试功能真的很有用,通过它可以轻松定位问题的所在,下面说说怎么调试. 开启Safari开发菜单 在Mac的 ...

  8. 651. 4 Keys Keyboard复制粘贴获得的最大长度

    [抄题]: Imagine you have a special keyboard with the following keys: Key 1: (A): Print one 'A' on scre ...

  9. 微软Office Online服务安装部署(三)

    现在开始配置两台服务器,两台服务器的IP: Server: 10.1.3.89 Client:  10.1.3.92 1.在Client中,.打开网络属性,找到ipv4的配置,将dns 改成域控制器的 ...

  10. 新建一个项目,如何使用abp用户登录系统

    1.首先参考Framework.Web里的packages.config,把相关的包都安装好. 2.App_Start文件夹下的xxxModule.cs和Startup.cs拷过来,修改下命名空间. ...