相关下载:

  1. https://prometheus.io/download/
    https://github.com/prometheus/

相关文档

  1. https://songjiayang.gitbooks.io/prometheus/content/alertmanager/wechat.html #此文档最全
  2.  
  3. https://github.com/yunlzheng/prometheus-book
    https://github.com/prometheus/

prometheus安装和简单配置

tar -xf prometheus-2.2.0.linux-amd64.tar.gz
mv prometheus-2.2.0.linux-amd64 /opt/prometheus
mkdir -p /opt/prometheus/{bin,data,log}
mv /opt/prometheus/prometheus /opt/prometheus/bin/
useradd -M -s /sbin/nologin prometheus
chown -R prometheus.prometheus /opt/prometheus

prometheus简单配置
vim /opt/prometheus/prometheus.yml
==================================
# my global config
global:
scrape_interval: 15s
evaluation_interval: 15s

#配置alertmanager
#alerting:
# alertmanagers:
# - static_configs:
# - targets:
# - 172.16.0.10:9093

#配置告警规则
rule_files:
# - "/opt/prometheus/test.yml"
# - "second_rules.yml"

#监控主机9090是prometheus服务端,9100prometheus客户端
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['0.0.0.0:9090']
labels:
instance: 'prometheus'
==================================

prometheus systemctl脚本
/etc/systemd/system/prometheus.service
==================================
[Unit]
Description=prometheus service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
User=prometheus
ExecStart=/opt/prometheus/run_prometheus.sh
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target
==================================

prometheus 运行脚本
/opt/prometheus/run_prometheus.sh
==================================
#!/bin/bash
set -e
ulimit -n 1000000

DEPLOY_DIR=/opt/prometheus
cd "${DEPLOY_DIR}" || exit 1

# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
exec 1>> /opt/prometheus/log/prometheus.log 2>> /opt/prometheus/log/prometheus.log

exec bin/prometheus \
--config.file="/opt/prometheus/prometheus.yml" \
--web.listen-address=":9090" \
--web.external-url="http://192.168.55.33:9090/" \
--web.enable-admin-api \
--log.level="info" \
--storage.tsdb.path="/opt/prometheus/data" \
--storage.tsdb.retention="15d"
==================================

chmod a+x /opt/prometheus/run_prometheus.sh

#启动prometheus
systemctl daemon-reload #添加或更改脚本需要执行
systemctl start prometheus
systemctl enable prometheus
netstat -lnp | grep prometheus #查看启动的端口

#重新加载配置文件
kill -1 prometheus_pid

#页面访问
x.x.x.x:9090

启动成功以后我们可以通过Prometheus内置了web界面访问,http://ip:9090 ,如果出现以下界面,说明配置成功

node_exporter安装

tar -xf node_exporter-0.15.0.linux-amd64.tar.gz
mv node_exporter-0.15.0.linux-amd64 /opt/node_exporter
mkdir -p /opt/node_exporter/{bin,log}
mv /opt/node_exporter/node_exporter /opt/node_exporter/bin/
chown -R prometheus.prometheus /opt/node_exporter
#systemctl 脚本
vim /etc/systemd/system/node_exporter.service
==============================================
[Unit]
Description=node_exporter service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
User=prometheus
ExecStart=/opt/node_exporter/run_node_exporter.sh
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target
===============================================

#运行脚本
/opt/node_exporter/run_node_exporter.sh
================================================
#!/bin/bash
set -e
ulimit -n 1000000

# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/opt/node_exporter
cd "${DEPLOY_DIR}" || exit 1

exec 1>> /opt/node_exporter/log/node_exporter.log 2>> /opt/node_exporter/log/node_exporter.log

exec bin/node_exporter --web.listen-address=":9100" \
--log.level="info"
================================================
chmod a+x /opt/node_exporter/run_node_exporter.sh

#启动node_exporter
systemctl daemon-reload #添加或更改脚本需要执行
systemctl start node_exporter
systemctl enable node_exporter
netstat -lnp | grep 9100 #查看启动的端口

alertmanager安装

alertmanager安装
tar -xf alertmanager-0.14.0.linux-amd64.tar.gz
mv alertmanager-0.14.0.linux-amd64 /opt/alertmanager
mkdir -p /opt/alertmanager/{bin,log,data}
mv /opt/alertmanager/alertmanager /opt/alertmanager/bin/
chown -R prometheus.prometheus /opt/alertmanager

systemctl启动脚本:
cat /etc/systemd/system/alertmanager.service
=============================================
[Unit]
Description=alertmanager service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
User=prometheus
ExecStart=/opt/alertmanager/run_alertmanager.sh
Restart=always
RestartSec=15s

[Install]
WantedBy=multi-user.target
=============================================

run启动脚本
cat /opt/alertmanager/run_alertmanager.sh
=============================================
#!/bin/bash
set -e
ulimit -n 1000000

DEPLOY_DIR=/opt/alertmanager
cd "${DEPLOY_DIR}" || exit 1

# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
exec 1>> /opt/alertmanager/log/alertmanager.log 2>> /opt/alertmanager/log/alertmanager.log

exec bin/alertmanager \
--config.file="/opt/alertmanager/alertmanager.yml" \
--storage.path="/opt/alertmanager/data" \
--data.retention=120h \
--log.level="info" \
--web.listen-address=":9093"
=============================================

chmod a+x /opt/alertmanager/run_alertmanager.sh

cat alertmanager.yml #配置alertmanager文件
=============================================
global:
smtp_smarthost: 'smtp.sina.com:25'
smtp_from: 'xgmxgmxm@sina.com'
smtp_auth_username: 'xgmxgmxm@sina.com'
smtp_auth_password: 'xxxxxxxx'

templates:
- '/opt/alertmanager/template/*.tmpl'

route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 1m
repeat_interval: 2m
receiver: default-receiver

receivers:
- name: 'default-receiver'
email_configs:
- to: 'hanxiaohui@prosysoft.com'
=============================================

#启动alertmanager
systemctl daemon-reload #添加或更改脚本需要执行
systemctl start alertmanager
systemctl enable alertmanager
ps -ef | grep alertmanager #查看启动的进程

邮件告警

  1. alertmanager配置
  2. cat /opt/alertmanager/alert.yml
  3. =============================================
  4. global:
  5. smtp_smarthost: 'smtp.sina.com:25'
  6. smtp_from: 'xgmxgmxm@sina.com'
  7. smtp_auth_username: 'xgmxgmxm@sina.com'
  8. smtp_auth_password: 'xxxxxxxx'
  9.  
  10. templates:
  11. - '/opt/alertmanager/template/*.tmpl'
  12.  
  13. route:
  14. group_by: ['alertname', 'cluster', 'service']
  15. group_wait: 30s
  16. group_interval: 1m
  17. repeat_interval: 2m
  18. receiver: default-receiver
  19.  
  20. receivers:
  21. - name: 'default-receiver'
  22. email_configs:
  23. - to: 'hanxiaohui@prosysoft.com'
  24. =============================================
  25.  
  26. prometheus配置
  27. vim /opt/prometheus/prometheus.yml
  28. ===========================================
  29. # my global config
  30. global:
  31. scrape_interval: 15s
  32. evaluation_interval: 15s
  33.  
  34. #配置alertmanager连接
  35. alerting:
  36. alertmanagers:
  37. - static_configs:
  38. - targets:
  39. - 172.16.0.10:9093
  40.  
  41. #配置告警规则
  42. rule_files:
  43. - "/opt/prometheus/test.yml"
  44. # - "second_rules.yml"
  45.  
  46. #监控主机9090是prometheus服务端,9100prometheus客户端
  47. scrape_configs:
  48. - job_name: 'prometheus'
  49. static_configs:
  50. - targets: ['0.0.0.0:9090']
  51. labels:
  52. instance: 'prometheus'
  53. - job_name: 'node_prometheus'
  54. static_configs:
  55. - targets: ['0.0.0.0:9100']
  56. labels:
  57. instance: 'node_prometheus'
  58. ===========================================
  59.  
  60. ##规则配置
  61. cat /opt/prometheus/test.yml
  62. =============================================
  63. groups:
  64. - name: test-rule
  65. rules:
  66. - alert: NodeMemoryUsage
  67. expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
  68. for: 1m
  69. labels:
  70. severity: warning
  71. annotations:
  72. summary: "{{$labels.instance}}: High Memory usage detected"
  73. description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
  74. =============================================

微信告警

  1.  
  1. cat /opt/alertmanager/alert.yml #内容如下
  1. route:
  2. group_by: ['alertname']
  3. receiver: 'wechat'
  4.  
  5. receivers:
  6. - name: 'wechat'
  7. wechat_configs:
  8. - corp_id: 'xxxxxx' #企业微信账号唯一 ID, 可以在我的企业中查看
  9. to_party: 'x' #需要发送的组
  10. agent_id: 'xxxxxx' #第三方企业应用的 ID,可以在自己创建的第三方企业应用详情页面查看
  11. api_secret: 'xxxxxxx' #第三方企业应用的密钥,可以在自己创建的第三方企业应用详情页面查看。

告警分组

  1. [root@prometheus rules]# cat /opt/prometheus/prometheus.yml
  2. global:
  3. scrape_interval: 15s # Set the scrape interval to every seconds. Default is every minute.
  4. evaluation_interval: 15s # Evaluate rules every seconds. The default is every minute. 报警规则间隔
  5.  
  6. # Alertmanager configuration
  7. alerting:
  8. alertmanagers:
  9. - static_configs:
  10. - targets:
  11. - 172.16.0.22:
  12.  
  13. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
  14. rule_files:
  15. - "/opt/prometheus/rules/node.yml" #中指定规则文件(可使用通配符,如rules/*.yml)
  16. # - "second_rules.yml"
  17.  
  18. # A scrape configuration containing exactly one endpoint to scrape:
  19. # Here it's Prometheus itself.
  20. scrape_configs:
  21. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  22. - job_name: 'prometheus'
  23.  
  24. # metrics_path defaults to '/metrics'
  25. # scheme defaults to 'http'.
  26.  
  27. static_configs:
  28. - targets: ['localhost:9090']
  29. - job_name: 'node'
  30. static_configs:
  31. - targets:
  32. - '172.16.0.20:9100'
  33. - '172.16.0.21:9100'
  34. - '172.16.0.22:9100'
  35.  
  36. =====================================================
  37.  
  38. [root@prometheus rules]# cat /opt/prometheus/rules/node.yml
  39. groups:
  40. - name: node
  41. rules:
  42. - alert: server_status
  43. expr: up{job="node"} == 0 #job='node' 实现分组
  44. for: 15s
  45. labels:
  46. severity: warning
  47. team: node #分组
  48. annotations:
  49. summary: "{{ $labels.instance }} is down"
  50. - name: memUsage
  51. rules:
  52. - alert: NodeMemoryUsage
  53. expr: (node_memory_MemTotal - (node_memory_MemFree+node_memory_Buffers+node_memory_Cached )) / node_memory_MemTotal * 100 > 80
  54. for: 1m
  55. labels:
  56. severity: warning
  57. annotations:
  58. summary: "{{$labels.instance}}: High Memory usage detected"
  59. description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }}"
  60.  
  61. #配置文件设置好后,需要让prometheus重新读取,有两种方法:
  62. 通过HTTP API向/-/reload发送POST请求,例:curl -X POST http://localhost:9090/-/reload
  63. prometheus进程发送SIGHUP信号.例 kill -1 prometheus进程id
  64.  
  65. =========================================
  66. [root@alert alertmanager]# cat /opt/alertmanager/alert.yml
  67. #route属性用来设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。
  68. global:
  69. smtp_smarthost: 'smtp.sina.com:25'
  70. smtp_from: 'xxxxxxx@sina.com'
  71. smtp_auth_username: 'xxxxxxx@sina.com'
  72. smtp_auth_password: 'xxxxxxxxx'
  73. route:
  74. group_wait: 30s
  75. group_interval: 1m
  76. repeat_interval: 1m
  77. group_by: [alertname]
  78. receiver: 'wechat'
  79. routes:
  80. - receiver: mail
  81. match:
  82. team: node #rules.yml里相同team的告警 从这里发出
  83.  
  84. receivers:
  85. - name: 'wechat'
  86. wechat_configs:
  87. - corp_id: 'xxxxx'
  88. to_party: 'xxx' #'5|1'多组发送
  89. agent_id: 'xxxxx'
  90. api_secret: 'xxxxxx'
  91. - name: 'mail'
  92. email_configs:
  93. - to: 'xxxxxx@prosysoft.com'
  94. - to: 'xxxxxx@163.com'

Grafana安装,展示

  1. ##依赖
  2. yum install fontconfig freetype* urw-fonts
  3. ## 安装依赖grafana运行需要go环境
  4. yum install go -y
  5. ## 安装 grafana
  6. yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.6.1-1.x86_64.rpm -y
  7.  
  8. 安装包信息:
  9. 二进制文件: /usr/sbin/grafana-server
  10. init.d 脚本: /etc/init.d/grafana-server
  11. 环境变量文件: /etc/sysconfig/grafana-server
  12. 配置文件: /etc/grafana/grafana.ini
  13. 启动项: grafana-server.service
  14. 日志文件:/var/log/grafana/grafana.log
  15. 默认配置的sqlite3数据库:/var/lib/grafana/grafana.db
  16.  
  17. 编辑配置文件/etc/grafana/grafana.ini ,修改dashboards.json段落下两个参数的值:
  18. [dashboards.json]
  19. enabled = true
  20. path = /var/lib/grafana/dashboards
  21.  
  22. 安装仪表盘JSON模版:
  23. git clone https://github.com/percona/grafana-dashboards.git
  24. cp -r grafana-dashboards/dashboards /var/lib/grafana/
  25.  
  26. 启动grafana,并设置开机启动:
  27. systemctl daemon-reload
  28. systemctl start grafana-server
  29. systemctl status grafana-server
  30. systemctl enable grafana-server.service

页面访问:

x.x.x.x:3000   #默认账号密码admin/admin

添加数据源

查看数据

blackbox_exporter安装和配置

  1. blackbox_exporter安装和配置
  2.  
  3. tar -xf blackbox_exporter-0.12..linux-amd64.tar.gz
  4. mv blackbox_exporter-0.12..linux-amd64 /opt/blackbox_exporter
  5. mkdir -p /opt/blackbox_exporter/{bin,log}
  6. mv /opt/blackbox_exporter/blackbox_exporter /opt/blackbox_exporter/bin/
  7.  
  8. #systemd脚本
  9. vim /etc/systemd/system/blackbox_exporter.service
  10. =================================================
  11. [Unit]
  12. Description=blackbox_exporter service
  13. After=syslog.target network.target remote-fs.target nss-lookup.target
  14.  
  15. [Service]
  16. LimitNOFILE=
  17. User=root
  18. ExecStart=/opt/blackbox_exporter/run_blackbox_exporter.sh
  19. Restart=always
  20. RestartSec=15s
  21.  
  22. [Install]
  23. WantedBy=multi-user.target
  24. =================================================
  25.  
  26. run脚本
  27. vim /opt/blackbox_exporter/run_blackbox_exporter.sh
  28. =================================================
  29. #!/bin/bash
  30. set -e
  31. ulimit -n
  32.  
  33. # WARNING: This file was auto-generated. Do not edit!
  34. # All your edit might be overwritten!
  35. DEPLOY_DIR=/opt/blackbox_exporter
  36. cd "${DEPLOY_DIR}" || exit
  37.  
  38. exec >> /opt/blackbox_exporter/log/blackbox_exporter.log >> /opt/blackbox_exporter/log/blackbox_exporter.log
  39.  
  40. exec bin/blackbox_exporter --web.listen-address=":9115" \
  41. --log.level="info" \
  42. --config.file="/opt/blackbox_exporter/blackbox.yml"
  43. =================================================
  44.  
  45. chown -R prometheus.prometheus /opt/blackbox_exporter
  46. chmod a+x /opt/blackbox_exporter/run_blackbox_exporter.sh
  47.  
  48. blackbox.yml配置
  49. vim /opt/blackbox_exporter/blackbox.yml
  50. =================================================
  51. modules:
  52. http_2xx:
  53. prober: http
  54. http:
  55. method: GET
  56. http_post_2xx:
  57. prober: http
  58. http:
  59. method: POST
  60. tcp_connect:
  61. prober: tcp
  62. pop3s_banner:
  63. prober: tcp
  64. tcp:
  65. query_response:
  66. - expect: "^+OK"
  67. tls: true
  68. tls_config:
  69. insecure_skip_verify: false
  70. ssh_banner:
  71. prober: tcp
  72. tcp:
  73. query_response:
  74. - expect: "^SSH-2.0-"
  75. irc_banner:
  76. prober: tcp
  77. tcp:
  78. query_response:
  79. - send: "NICK prober"
  80. - send: "USER prober prober prober :prober"
  81. - expect: "PING :([^ ]+)"
  82. send: "PONG ${1}"
  83. - expect: "^:[^ ]+ 001"
  84. icmp:
  85. prober: icmp
  86. timeout: 5s
  87. icmp:
  88. preferred_ip_protocol: "ip4"
  89. =================================================
  90.  
  91. 启动(注意:需要使用root来启动,不然icmp会有ping socket permission denied 的错误)
  92. systemctl daemon-reload
  93. systemctl start blackbox_exporter
  94. systemctl enable blackbox_exporter
  95. ps -ef | grep blackbox_exporter
  96.  
  97. 结合prometheus
  98. /opt/prometheus/prometheus.yml 配置
  99. ======================================================
  100. - job_name: "blackbox_exporter_http"
  101. scrape_interval: 30s
  102. metrics_path: /probe
  103. params:
  104. module: [http_2xx]
  105. static_configs:
  106. - targets:
  107. - 'http://192.168.55.33:9090/metrics'
  108. relabel_configs:
  109. - source_labels: [__address__]
  110. target_label: __param_target
  111. - source_labels: [__param_target]
  112. target_label: instance
  113. - target_label: __address__
  114. replacement: 192.168.55.33:
  115.  
  116. - job_name: "app_port_tcp"
  117. scrape_interval: 30s
  118. metrics_path: /probe
  119. params:
  120. module: [tcp_connect]
  121. static_configs:
  122. - targets:
  123. - '192.168.55.33:9100'
  124. - '192.168.55.34:9100'
  125. labels:
  126. group: 'node'
  127. relabel_configs:
  128. - source_labels: [__address__]
  129. target_label: __param_target
  130. - source_labels: [__param_target]
  131. target_label: instance
  132. - target_label: __address__
  133. replacement: 192.168.55.33:
  134.  
  135. - job_name: "blackbox_exporter_192.168.55.33_icmp"
  136. scrape_interval: 6s
  137. metrics_path: /probe
  138. params:
  139. module: [icmp]
  140. static_configs:
  141. - targets:
  142. - '192.168.55.33'
  143. - '192.168.55.34'
  144. relabel_configs:
  145. - source_labels: [__address__]
  146. regex: (.*)(:)?
  147. target_label: __param_target
  148. replacement: ${}
  149. - source_labels: [__param_target]
  150. regex: (.*)
  151. target_label: ping
  152. replacement: ${}
  153. - source_labels: []
  154. regex: .*
  155. target_label: __address__
  156. replacement: 192.168.55.33:
  157. - job_name: "blackbox_exporter_192.168.55.34_icmp"
  158. scrape_interval: 6s
  159. metrics_path: /probe
  160. params:
  161. module: [icmp]
  162. static_configs:
  163. - targets:
  164. - '192.168.55.33'
  165. - '192.168.55.34'
  166. relabel_configs:
  167. - source_labels: [__address__]
  168. regex: (.*)(:)?
  169. target_label: __param_target
  170. replacement: ${}
  171. - source_labels: [__param_target]
  172. regex: (.*)
  173. target_label: ping
  174. replacement: ${}
  175. - source_labels: []
  176. regex: .*
  177. target_label: __address__
  178. replacement: 192.168.55.34:
  179. ======================================================
  180.  
  181. rules告警配置
  182. vim /opt/prometheus/rules/blackbox_rules.yml
  183. ======================================================
  184. groups:
  185. - name: alert.rules
  186. rules:
  187. - alert: node_exporter_is_down
  188. expr: probe_success{group="node"} ==
  189. for: 1m
  190. labels:
  191. env: test-cluster
  192. level: emergency
  193. expr: probe_success{group="node"} ==
  194. annotations:
  195. description: 'alert: instance: {{ $labels.instance }} values: {{ $value }}'
  196. value: '{{ $value }}'
  197. summary: Syncer server is down
  198.  
  199. - alert: prometheus_metrics_interface
  200. expr: probe_success{job="blackbox_exporter_http"} ==
  201. for: 1m
  202. labels:
  203. env: puyi-cluster
  204. level: emergency
  205. expr: probe_success{job="blackbox_exporter_http"} ==
  206. annotations:
  207. description: 'alert: instance: {{ $labels.instance }} values: {{ $value }}'
  208. value: '{{ $value }}'
  209. summary: prometheus metrics interface is down
  210.  
  211. - alert: BLACKER_ping_latency_more_than_1s
  212. expr: max_over_time(probe_duration_seconds{job=~"blackbox_exporter.*_icmp"}[1m]) >
  213. for: 1m
  214. labels:
  215. env: puyi-cluster
  216. level: warning
  217. expr: max_over_time(probe_duration_seconds{job=~"blackbox_exporter.*_icmp"}[1m]) >
  218. annotations:
  219. description: 'alert: instance: {{ $labels.instance }} values: {{ $value }}'
  220. value: '{{ $value }}'
  221. summary: blackbox_exporter ping latency more than 1s
  222. ======================================================
  223.  
  224. blackbox_exporter PromQL 语句
  225. =====================================================
  226. probe_success{job="blackbox_exporter_http"} #http 状态1是正常
  227. max_over_time(probe_duration_seconds{job=~"blackbox_exporter.*_icmp"}[1m]) #icmp 状态1是正常
  228. probe_success{group="node"} #node tcp_connect 状态1是正常
  229.  
  230. =====================================================

pushgateway安装和配置

  1. 安装配置
  2. tar -xf pushgateway-0.5..linux-amd64.tar.gz
  3. mv pushgateway-0.5..linux-amd64 /opt/pushgateway
  4. mkdir -p /opt/pushgateway/{bin,log}
  5. mv /opt/pushgateway/pushgateway /opt/pushgateway/bin/
  6.  
  7. systemd脚本
  8. vim /etc/systemd/system/pushgateway.service
  9. ===========================================
  10. [Unit]
  11. Description=pushgateway service
  12. After=syslog.target network.target remote-fs.target nss-lookup.target
  13.  
  14. [Service]
  15. LimitNOFILE=
  16. User=prometheus
  17. ExecStart=/opt/pushgateway/run_pushgateway.sh
  18. Restart=always
  19. RestartSec=15s
  20.  
  21. [Install]
  22. WantedBy=multi-user.target
  23. ===========================================
  24.  
  25. run脚本
  26. vim /opt/pushgateway/run_pushgateway.sh
  27. ===========================================
  28. #!/bin/bash
  29. set -e
  30. ulimit -n
  31.  
  32. DEPLOY_DIR=/opt/pushgateway
  33. cd "${DEPLOY_DIR}" || exit
  34.  
  35. # WARNING: This file was auto-generated. Do not edit!
  36. # All your edit might be overwritten!
  37. exec >> /opt/pushgateway/log/pushgateway.log >> /opt/pushgateway/log/pushgateway.log
  38.  
  39. exec bin/pushgateway \
  40. --log.level="info" \
  41. --web.listen-address=":9091"
  42. ===========================================
  43.  
  44. chown -R prometheus.prometheus /opt/pushgateway
  45.  
  46. chmod a+x /opt/pushgateway/run_pushgateway.sh
  47.  
  48. 启动
  49. systemctl daemon-reload
  50. systemctl start pushgateway
  51. systemctl enable pushgateway
  52. systemctl status pushgateway
  53.  
  54. pushgateway写入数据
  55. echo "some_metric 3.14" | curl --data-binary @- http://192.168.55.33:9091/metrics/job/test
  56. 访问192.168.55.:/metrics
  57. 可以看到 some_metric{instance="",job="test"} 3.14
  58.  
  59. 注意:使用pushgateway0..1版本收集 tidb的数据会报错(has label dimensions inconsistent with previously collected metrics in the same metric family),用0.40.3.1
  60.  
  61. pushgateway结合prometheus
  62. vim /etc/prometheus/prometheus.yml #添加pushgateway job
  63. ====================================
  64. - job_name: 'pushgateway'
  65. scrape_interval: 3s
  66. honor_labels: true # 从而避免收集数据本身的 job 和 instance 被覆盖
  67. static_configs:
  68. - targets:
  69. - '192.168.55.33:9091'
  70. ====================================
  71.  
  72. 结合tidb
  73. tidb.toml添加
  74. =================
  75. [status]
  76. metrics-addr = "192.168.55.33:9091" #设为 Push Gateway 的地址
  77. metrics-interval = #为 push 的频率,单位为秒,默认值为
  78. report-status = true
  79. =================
  80.  
  81. 结合pd
  82. pd.toml添加
  83. ==================
  84. [metric]
  85. address = "192.168.55.33:9091" #设为 Push Gateway 的地址
  86. interval = "15s"
  87. ==================
  88.  
  89. 结合tikv
  90. tikv.toml添加
  91. ===================
  92. [metric]
  93. interval = "15s" #为 push 的频率,单位为秒,默认值为
  94. address = "192.168.55.33:9091" #设为 Push Gateway 的地址
  95. job = "tikv"
  96. ===================

PromQL

  1. Node Exporter 常用查询语句
  2. =========================================
  3.  
  4. #例如,基于2小时的样本数据,来预测主机可用磁盘空间的是否在4个小时候被占满,可以使用如下表达式
  5. predict_linear(node_filesystem_free{fstype="btrfs",instance="192.168.55.33:9100"}[2h], * ) <
  6.  
  7. CPU 使用率:
  8. - (avg by (instance) (irate(node_cpu{instance="xxx", mode="idle"}[5m])) * )
  9.  
  10. CPU mode 占比率:
  11. avg by (instance, mode) (irate(node_cpu{instance="xxx"}[5m])) *
  12.  
  13. 机器平均负载:
  14. node_load1{instance="xxx"} // 1分钟负载
  15. node_load5{instance="xxx"} // 5分钟负载
  16. node_load15{instance="xxx"} // 15分钟负载
  17.  
  18. 内存使用率:
  19. - ((node_memory_MemFree{instance="xxx"}+node_memory_Cached{instance="xxx"}+node_memory_Buffers{instance="xxx"})/node_memory_MemTotal) *
  20.  
  21. 磁盘使用率:
  22. - node_filesystem_free{instance="xxx",fstype!~"rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse.*"} / node_filesystem_size{instance="xxx",fstype!~"rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse.*"} *
  23.  
  24. 网络 IO:
  25. // 上行带宽
  26. sum by (instance) (irate(node_network_receive_bytes{instance="xxx",device!~"bond.*?|lo"}[5m])/)
  27. // 下行带宽
  28. sum by (instance) (irate(node_network_transmit_bytes{instance="xxx",device!~"bond.*?|lo"}[5m])/)
  29.  
  30. 网卡出/入包:
  31. // 入包量
  32. sum by (instance) (rate(node_network_receive_bytes{instance="xxx",device!="lo"}[5m]))
  33. // 出包量
  34. sum by (instance) (rate(node_network_transmit_bytes{instance="xxx",device!="lo"}[5m]))
  35.  
  36. #磁盘io
  37. avg by(instance) (irate(node_cpu{instance="192.168.55.201:9100",mode="iowait"}[5m]))*
  38.  
  39. =========================================
  40.  
  41. 内置函数:
  42. =========================================
  43. rate (last值-first值)/时间差s #求每秒增长值
  44. 例如: rate(http_requests_total[5m]) == last-first/
  45.  
  46. irate: (last - last前一个值 )/时间差s
  47. irate(v range-vector)函数, 输入:范围向量,输出:key: value = 度量指标: (last值-last前一个值)/时间戳差值。它是基于最后两个数据点,自动调整单调性,
  48.  
  49. increase: #last - first (增加值)
  50. increase(v range-vector)函数, 输入一个范围向量,返回:key:value = 度量指标:last值-first值,自动调整单调性,如:服务实例重启,则计数器重置。与delta()不同之处在于delta是求差值,而increase返回最后一个减第一个值,可为正为负。
  51. ==========================================
  52.  
  53. 匹配符
  54. ========================================
  55. =
  56. 例: http_requests_total{instance="localhost:9090"}
  57. !=
  58. 例: http_requests_total{instance!="localhost:9090"}
  59. =~
  60. 例: http_requests_total{environment=~"staging|testing|development",method!="GET"}
  61. !~
  62. 例: http_requests_total{method!~"get|post"}
  63. ========================================
  64.  
  65. 范围查询
  66. ================================
  67. http_requests_total{}[5m] #该表达式将会返回查询到的时间序列中最近5分钟的所有样本数据
  68. ================================
  69.  
  70. 时间位移操作
  71. ================================
  72. 而如果我们想查询,5分钟前的瞬时样本数据,或昨天一天的区间内的样本数据呢? 这个时候我们就可以使用位移操作,位移操作的关键字为offset
  73. http_requests_total{} offset 5m #5分钟前的瞬时数据
  74. http_requests_total{}[2m] offset 2m #2分钟前 2分钟的数据
  75. ================================
  76.  
  77. 时间单位
  78. ================================
  79. 除了使用m表示分钟以外,PromQL的时间范围选择器支持其它时间单位:
  80. s -
  81. m - 分钟
  82. h - 小时
  83. d -
  84. w -
  85. y -
  86. =================================

自定义exporter

  1. ----------------------------python_client--------------------------------
  2. #Counter实例(只增不减)
  3. ==============================================================
  4. from prometheus_client import start_http_server, Counter
  5. import random
  6. import time
  7.  
  8. c = Counter('my_failures_total', 'Description of counter',['job','status'])
  9. def my_failures_total(t):
  10.  
  11. c.labels(job='test',status='ok').inc(t) #计数器增加(只增不减)
  12. time.sleep()
  13.  
  14. if __name__ == '__main__':
  15. start_http_server()
  16. for num in range(,):
  17. my_failures_total(num)
  18.  
  19. http://xxxxxxx:8000/metrics
  20. ================================================================
  21.  
  22. #Gauge实例 (有增有减,有set)
  23. ==================================================================
  24. #g = Gauge('my_g_value', 'g value') 【my_g_value是监控指标 'g value'是描述】
  25. from prometheus_client import start_http_server, Gauge
  26. import random
  27. import time
  28.  
  29. g = Gauge('my_g_value', 'g value',['job','status'])
  30. def my_g_value(t):
  31. g.labels(job='test',status='ok').set(t) #设置g值
  32. time.sleep()
  33.  
  34. if __name__ == '__main__':
  35. start_http_server()
  36. for num in range(,):
  37. my_g_value(num)
  38.  
  39. http://xxxxxxx:8000/metrics
  40. ===================================================================

2-prometheus各组件安装的更多相关文章

  1. Kubernetes实战(二):k8s v1.11.1 prometheus traefik组件安装及集群测试

    1.traefik traefik:HTTP层路由,官网:http://traefik.cn/,文档:https://docs.traefik.io/user-guide/kubernetes/ 功能 ...

  2. Prometheus Operator 的安装

    Prometheus Operator 的安装 接下来我们用自定义的方式来对 Kubernetes 集群进行监控,但是还是有一些缺陷,比如 Prometheus.AlertManager 这些组件服务 ...

  3. Microsoft Visual Studio Web 创作组件安装失败的解决方法

    在网上查一下说是Office2007的问题.我把Office2007卸载了还是不行. 然后用Windows Install Clean Up工具清理,还是不行. 郁闷了.然后在安装包中的下面路径下找到 ...

  4. Gulp及组件安装构建

    Gulp 是一款基于任务的设计模式的自动化工具,通过插件的配合解决全套前端解决方案,如静态页面压缩.图片压缩.JS合并.SASS同步编译并压缩CSS.服务器控制客户端同步刷新. Gulp安装 全局安装 ...

  5. Linux下的暴力密码在线破解工具Hydra安装及其组件安装-使用

    Linux下的暴力密码在线破解工具Hydra安装及其组件安装-使用 hydra可以破解: http://www.thc.org/thc-hydra,可支持AFP, Cisco AAA, Cisco a ...

  6. delphi 组件安装教程详解

    学习安装组件的最好方法,就是自己编写一个组件并安装一遍,然后就真正明白其中的原理了.   本例,编写了两个BPL, dclSimpleEdit.bpl 与 SimpleLabel.bpl ,其中,dc ...

  7. 云计算OpenStack:云计算介绍及组件安装(一)--技术流ken

    云计算介绍 当用户能够通过互联网方便的获取到计算.存储等服务时,我们比喻自己使用到了“云计算”,云计算并不能被称为是一种计算技术,而更像是一种服务模式.每个运维人员心里都有一个对云计算的理解,而最普遍 ...

  8. OpenStack基础组件安装keystone身份认证服务

    域名解析 vim /etc/hosts 192.168.245.172 controller01 192.168.245.171 controller02 192.168.245.173 contro ...

  9. 一、OpenStack环境准备及共享组件安装

    一.OpenStack部署环境准备: 1.关闭防火墙所有虚拟机都要操作 # setenforce 0 # systemctl stop firewalld 2.域名解析所有虚拟机都要操作 # cat ...

  10. VS2008安装“Visual Studio Web 创作组件”安装失败的解决方法

    VS2008安装“Visual Studio Web 创作组件”安装失败的解决方法 今天在单位电脑安装VS2008,当安装到“Visual Studio Web 创作组件”时出现错误. 准备手动安装 ...

随机推荐

  1. 【ABAP系列】SAP smartforms金额字段产生空格,除去空格的方法

    公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[ABAP系列]SAP smartforms金额 ...

  2. 2019/11/02 TZOJ

    1001 ShaoLin http://www.tzcoder.cn/acmhome/problemdetail.do?&method=showdetail&id=6003 标记一下i ...

  3. dubbo入门和springboot集成dubbo小例子

    从零开始搭建springboot-dubbo的例子 Dubbo 是一个分布式服务框架,致力于提供高性能和透明化的 RPC 远程服务调用方案,以及 SOA 服务治理方案 一. Dubbo的简单介绍 1. ...

  4. Django csrf,xss,sql注入

    一.csrf跨站请求伪造(Cross-site request forgery) CSRF的攻击原理:简单说就是利用了高权限帐号(如管理员)的登录状态或者授权状态去做一些后台操作,但实际这些状态并没有 ...

  5. ichunqiu在线挑战--网站综合渗透实验 writeup

    挑战链接:http://www.ichunqiu.com/tiaozhan/111 知识点:后台弱口令,md5破解,SQL Injection,写一句话木马,敏感信息泄露, 提权,登陆密码破解 这个挑 ...

  6. ELK日志分析系统之logstash7.x最新版安装与配置

    2 .Logstash的简介 2.1 logstash 介绍 LogStash由JRuby语言编写,基于消息(message-based)的简单架构,并运行在Java虚拟机(JVM)上.不同于分离的代 ...

  7. php实现字符串翻转,使字符串的单词正序,单词的字符倒序

    如字符串'I love you'变成'I evol uoy',只能使用strlen(),不能使用其他内置函数. function strturn($str){ $pstr=''; $sstr=''; ...

  8. python学习第二十六天非固定参数几种情况

    python函数参数传递,位置参数,默认参数,关键词参数,最后介绍一个非固定参数,就可以向函数传递一个列表,元组,字典,具体看看用法 1,有一个* 号的参数情况 def goos_stu(id,*us ...

  9. 【接口工具】接口抓包工具之Charles

    上篇我们讲了Fiddler,Fiddler是用C#开发的,所以Fiddler不能在Mac系统中运行,没办法直接用Fiddler来截获MAC系统中的HTTP/HTTPS, Mac 用户怎么办呢? 1.F ...

  10. 02.Linux-CentOS系统NFS挂载时拒绝访问挂载问题

    问题: 在挂载nfs时报拒绝访问挂载:mount -t nfs 192.163.1.10:/home/opneuser/upload /home/openuser/upload/ 报错信息:Mount ...