监控prometheus

一、prometheus-webhook-daingtalak

github地址：[Releases · timonwong/prometheus-webhook-dingtalk · GitHub](https://github.com/timonwong/prometheus-webhook-dingtalk/releases)
下载地址：[](https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz)

自己去GitHub上下载需要的版本，然后解压：

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

tar xf prometheus-webhook-dingtalk-0.3..linux-amd64.tar.gz -C /data; cd /data

mv prometheus-webhook-dingtalk-0.3..linux-amd64 prometheus-webhook-dingtalk

修改配置文件:
# cat default.tmpl

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}

{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

{{ define "__text_alert_list" }}{{ range . }}

**Labels**

{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}

{{ end }}

**Annotations**

{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}

{{ end }}

**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})

{{ end }}{{ end }}

{{ define "ding.link.title" }}{{ template "__subject" . }}{{ end }}

{{ define "ding.link.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**

{{ template "__text_alert_list" .Alerts.Firing }}

{{ end }}

启动服务：
# cat prometheus-webhook-dingtalk.sh

#!/bin/bash

nohup prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="test=https://oapi.dingtalk.com/robot/send?access_token=89f3cedfb3c3cdb031bdf10f8fc52bf1add575e9b5fb6f462a8cca6859af4" >>/data/prometheus-webhook-daingtalak/nohub.out >& &

--ding.profile是钉钉机器人生成的，自己创建个钉钉机器人。

二、Alertmanager
github地址：[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

下载地址：[Releases · prometheus/alertmanager · GitHub](https://github.com/prometheus/alertmanager/releases)

自己去GitHub上下载需要的版本，然后解压：

wget https://github.com/prometheus/alertmanager/releases/download/v0.15.1/alertmanager-0.15.1.linux-amd64.tar.gz

tar xf alertmanager-0.15..linux-amd64.tar.gz -C /data ;cd /data

mv alertmanager-0.15..linux-amd64 alertmanager

修改配置文件，由于我自己使用的是钉钉告警，所以本文使用的钉钉：
# cat alertmanager.yml

global:

  resolve_timeout: 5m

route:

  group_by: ['alertname']

  group_wait: 10s

  group_interval: 10s

  repeat_interval: 1h

  receiver: 'test'

receivers:

- name: 'test'

  webhook_configs:

   - url: "http://127.0.0.1:8060/dingtalk/test/send"

     send_resolved: true

此处的url是prometheus-webhook-daingtalak的地址，用于将告警信息转换成钉钉可以接受的消息格式。

启动alertmanager：
# cat alertmanager.sh

#!/bin/bash

nohup alertmanager --config.file="/data/alertmanager/alertmanager.yml" --storage.path="/data/alertmanager/data" --web.listen-address="0.0.0.0:9093" >>/data/alertmanager/nohub.out >& &

alertmanager访问地址：
http://ip:9093

三、Prometheus

github地址：[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)

1、prometheus组成
1）prometheus：主程序，主要负责采集数据以及数据存储，并且对外提供PromQL实现监控数据的查询以及聚合分析；
2）*_exporter：于向Prometheus Server暴露数据采集的endpoint,Prometheus轮训这些Exporter采集并且保存数据；
3）alertManager: 负责实现告警，结合邮件或钉钉
4）pushgateway: Prometheus为一些临时存在的进程，如批处理任务，提供了Push Gateway，这些客户端可以将数据push到Push Gateway中，然后由Push Gateway提供pull接口将数据暴露给PrometheusServer。

5）prometheus主要通过pull的方式获取数据，这样就大大减少了被监控端的压力和系统资源的占用。

2、安装
下载地址：[Releases · prometheus/prometheus · GitHub](https://github.com/prometheus/prometheus/releases)
自己去GitHub上下载需要的版本，然后解压：

wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz

tar xf prometheus-2.3..linux-amd64.tar.gz -C /data ;cd /data

mv prometheus-2.3..linux-amd64 prometheus

然后修改配置文件，定义相应的监控项job:
# cat prometheus.yml

# my global config

global:

  scrape_interval:     15s # Set the scrape interval to every  seconds. Default is every  minute.

  evaluation_interval: 15s # Evaluate rules every  seconds. The default is every  minute.

  # scrape_timeout is set to the global default (10s).

#remote_write:

#  - url: "http://10.2.79.208:9201/write"

#remote_read:

#  - url: "http://10.2.79.208:9201/read"

# Alertmanager configuration

alerting:

  alertmanagers:

  - static_configs:

    - targets:

      - 127.0.0.1:

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

  # - "first_rules.yml"

  # - "second_rules.yml"

  - "/data/prometheus/mongodb-rules.yml"

  - "/data/prometheus/consul-rules.yml"

  - "/data/prometheus/redis-rules.yml"

  - "/data/prometheus/nginx-rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'

    # scheme defaults to 'http'.

    static_configs:

    - targets: ['localhost:9090']

  - job_name: 'mongodb1'

    static_configs:

    - targets: ['10.10.8.70:9218']

  - job_name: 'mongodb1-system'

    static_configs:

    - targets: ['10.10.8.70:9100']

  - job_name: 'mongodb2'

    static_configs:

    - targets: ['10.10.5.108:9218']

rule_files:指定告警规则文件的路径，可以定义自己的告警规则

# cat consul-rules.yml

---

groups:

- name: consul

  rules:

  - alert: consul_catalog_service_node_healthy

    expr: consul_catalog_service_node_healthy <

    for: 60s

    labels:

      serverity: critical

    annotations:

      descrition: '{{ $labels.node }}  {{ $labels.service_id }} is Unhealth'

      summary: 'some service is unhealth,you must chek it out by consul'

  - alert: consul_node_health

    expr: consul_exporter_build_info <

    for: 60s

    labels:

       serverity: critical

    annotations:

       descrition: '{{ $labels.instance }} consul server is down '

       summary: 'consul server is down'

  - alert: consul_health_service_status

    expr: consul_health_service_status <

    for: 60s

    labels:

      serverity: critical

    annotations:

      descrition: '{{ $labels.node }}  {{ $labels.service_id }} is Unhealth'

      summary: 'some service is unhealth,you must chek it out by consul'

# cat mongodb-rules.yml

---

groups:

- name: mongodb

  rules:

  - alert: mongodb_mongod_connections

    expr: mongodb_mongod_connections{state='current'} and  mongodb_mongod_connections <

    for: 10s

    labels:

      serverity: critical

    annotations:

      description: '{{ $labels.instance }}   of      {{ $labels.job }}   connections is low  11'

      summary: 'connections is too Low,Mongodb mybe is Down!'

  - alert: mongodb_mongod_connections

    expr: mongodb_mongod_connections{state='current'} and  mongodb_mongod_connections >

    for: 10s

    labels:

      serverity: warning

    annotations:

      description: '{{ $labels.instance }}   of      {{ $labels.job }}   connections is high  570'

      summary: 'connections is too much'

  - alert: mongodb_mongod_memory

    expr:  mongodb_mongod_memory{type='virtual'} and mongodb_mongod_memory <

    for: 5s

    labels:

      serverity: critical

    annotations:

      description: '{{ $labels.instance }} of  {{ $labels.job }} {{ $labels.type }}   is too low'

      summary: 'mongodb mybe is down'

  - alert: mongodb_mongod_replset_member_health

    expr: mongodb_mongod_replset_member_health !=

    for: 5s

    labels:

      serverity: critical

    annotations:

      description: ' {{ $labels.name }}  {{ $labels.state}} is down'

      summary: 'one of replsets node is down'

  - alert: mongodb_mongod_replset_my_state

    expr: mongodb_mongod_replset_my_state{job='mongodb3'} and mongodb_mongod_replset_my_state !=

    for: 5s

    labels:

      serverity: critical

    annotations:

      description: ' replsets master have been  changed, {{ $labels.job }}  is not master'

      summary: 'mongodb3 master is down,chek the status'

#cat redis-rules.yml

---

groups:

- name: redis

  rules:

  - alert: redis_instantaneous_ops_per_sec

    expr: redis_instantaneous_ops_per_sec <

    for: 120s

    labels:

      serverity: critical

    annotations:

      descrition: '{{ $labels.job }}   is Unhealth'

      summary: 'redis-prod options/sec is too low,redis maybe traffic jam ,you must check it out by "redis-cli slowlog get"'

#cat nginx-rules.yml

---

groups:

- name: nginx-exporter

  rules:

  - alert: status_code_499

    expr: status_code_499 >

    for: 60s

    labels:

      serverity: critical

    annotations:

      descrition: ' status_code_499:{{ status_code_499 }}'

      summary: 'nginx status code 499 is too much,check loadbalance /var/log/nginx/share.log'

  - alert: status_code_400

    expr: status_code_400 >

    for: 60s

    labels:

      serverity: critical

    annotations:

      descrition: 'status_code_400: {{ status_code_400 }}'

      summary: 'nginx status code 400 is too much,check loadbalance /var/log/nginx/share.log'

nginx是我自己写的一个exportor，地址：https://github.com/cuishuaigit/nginx_exporter

启动：
# cat prometheus.sh

#!/bin/bash

nohup prometheus --config.file="/data/prometheus/prometheus.yml" --web.listen-address="0.0.0.0:9090"  --storage.tsdb.path="/data/prometheus/data"  --web.console.libraries="/data/prometheus/console_libraries"  --web.console.templates="/data/prometheus/consoles"  --web.enable-admin-api --log.level=info >>/data/prometheus/nohub.out >& &

prometheus_ui访问：
http://ip:9090

四、exporter

1、https://github.com/prometheus/node_exporter

2、https://github.com/prometheus/influxdb_exporter

3、https://github.com/prometheus/mysqld_exporter

4、https://github.com/prometheus/jmx_exporter

5、https://github.com/prometheus/consul_exporter

6、https://github.com/prometheus/haproxy_exporter

监控prometheus的更多相关文章

kubernetes监控-prometheus（十六）
监控方案 cAdvisor+Heapster+InfluxDB+Grafana Y 简单容器监控 cAdvisor/exporter+Prometheus+Grafana Y 扩展性好容器,应用, ...
kubernetes监控prometheus配置项解读
前言文中解决两个问题: 1. kubernetes官方推荐的监控 prometheus 的配置文件, 各项是什么含义 2. 配置好面板之后, 如换去配置 grafana 面板当然这两个问题网上都有 ...
Docker 监控- Prometheus VS Cloud Insight
如今,越来越多的公司开始使用 Docker 了,2 / 3 的公司在尝试了 Docker 后最终使用了它.为了能够更精确的分配每个容器能使用的资源,我们想要实时获取容器运行时使用资源的情况,怎样对 D ...
kubernetes之监控Prometheus实战--prometheus介绍--获取监控（一）
Prometheus介绍 Prometheus是一个最初在SoundCloud上构建的开源监控系统 .它现在是一个独立的开源项目,为了强调这一点,并说明项目的治理结构,Prometheus 于2016 ...
centos7下安装docker（17.4docker监控----prometheus）
Prometheus是一个非常优秀的监控工具.准确的说,应该是监控方案.Prometheus提供了监控数据搜集,存储,处理,可视化和告警一套完整的解决方案 Prometheus架构如盗图: 官网上的原 ...
kubernetes监控--Prometheus
本文基于kubernetes 1.5.2版本编写 kube-state-metrics kubectl create ns monitoring kubectl create sa -n monito ...
K8S的Kafka监控(Prometheus+Grafana)
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
Longhorn，企业级云原生容器分布式存储 - 监控(Prometheus+AlertManager+Grafana)
内容来源于官方 Longhorn 1.1.2 英文技术手册. 系列 Longhorn 是什么? Longhorn 企业级云原生容器分布式存储解决方案设计架构和概念 Longhorn 企业级云原生容器分 ...
k8s全方位监控 -prometheus实现短信告警接口编写（python）
1.prometheus短信告警接口实现(python)源码如下: import subprocess from flask import Flask from flask import reques ...

随机推荐

elasticsearch5环境搭建
1.下载ElasticSearch https://www.elastic.co/cn/downloads/elasticsearch#ga-release 因为是windows版本,所以下载zip即 ...
All Start Here.
缘由本博客是为天大软院 2016 级研一课程"现代软件工程"的课程设计而开设.同时借此机会和同学们进行技术交流与分享. 我们小组共有四位成员: 陈岩岩 2016218020 刘莞 ...
scikit-FEM
from skfem import * m = MeshTri() m.refine(4) e = ElementTriP1() basis = InteriorBasis(m, e) @biline ...
HTTP状态代码列表
httpContext.Response.StatusCode=200 1xx - 信息提示这些状态代码表示临时的响应.客户端在收到常规响应之前,应准备接收一个或多个 1xx 响应. · 100 - ...
CentOS使用vsftpd开启FTP服务以及配置用户
1.安装服务 #yum install vsftpd 2.配置 #vi /etc/vsftpd/vsftpd.conf # 禁止匿名访问 anonymous_enable=NO # 允许本地用户登录F ...
windows 7 下elasticsearch5.0 安装head 插件
windows 7 下elasticsearch5.0 安装head 插件 elasticsearch5.0 和2有了很大的变化,以前的很多插件都有了变化比如 bigdesk head,以下是安装he ...
Win7下无法启动sql server fulltext search （mssqlserver）的问题
在Win7下安装了SQL Server 2005, 但启动“SQL Server FullText Search (MSSQLSERVER)”服务时启动不成功,系统日志显示“SQL Server Fu ...
Windows核心编程：第2章字符和字符串处理
Github https://github.com/gongluck/Windows-Core-Program.git //第2章字符和字符串处理.cpp: 定义应用程序的入口点. // #incl ...
JQuery Mobile - 解决页面点击时候，页眉和页脚消失问题！
当点击页面时候,页眉和页脚会消失!解决方法,在页面和页脚中加入: data-quicklinks="true" 实际使用代码: <div data-role="pa ...
装饰者模式&数据库连接池原理
装饰者模式: 我是一个没有感情的杀手在复习到自建数据库连接池的时候有点蒙了,再次翻看视频整理如下:(装饰者模式下自建数据库连接池修改close功能为回收连接对象) 自备材料:数据库连接对象的获取的 ...

监控prometheus

监控prometheus的更多相关文章

随机推荐

热门专题