prometheus监控系统的的报警规则是在prometheus这个组件完成配置的。 prometheus支持2种类型的规则,记录规则和报警规则, 记录规则主要是为了简写报警规则和提高规则复用的, 报警规则才是真正去判定是否需要报警的规则。 报警规则中是可以使用记录规则的。

提供下我整理的node-exporter的记录规则和报警规则。

node-exporter-record-rules.yml

groups:
- name: node-exporter-record
rules:
- expr: up{job=~"node-exporter"}
record: node_exporter:up
labels:
desc: "节点是否在线, 在线1,不在线0"
unit: " "
job: "node-exporter"
- expr: time() - node_boot_time_seconds{}
record: node_exporter:node_uptime
labels:
desc: "节点的运行时间"
unit: "s"
job: "node-exporter"
##############################################################################################
# cpu #
- expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m]))) * 100
record: node_exporter:cpu:total:percent
labels:
desc: "节点的cpu总消耗百分比"
unit: "%"
job: "node-exporter" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="idle"}[5m]))) * 100
record: node_exporter:cpu:idle:percent
labels:
desc: "节点的cpu idle百分比"
unit: "%"
job: "node-exporter" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="iowait"}[5m]))) * 100
record: node_exporter:cpu:iowait:percent
labels:
desc: "节点的cpu iowait百分比"
unit: "%"
job: "node-exporter" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="system"}[5m]))) * 100
record: node_exporter:cpu:system:percent
labels:
desc: "节点的cpu system百分比"
unit: "%"
job: "node-exporter" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode="user"}[5m]))) * 100
record: node_exporter:cpu:user:percent
labels:
desc: "节点的cpu user百分比"
unit: "%"
job: "node-exporter" - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job="node-exporter",mode=~"softirq|nice|irq|steal"}[5m]))) * 100
record: node_exporter:cpu:other:percent
labels:
desc: "节点的cpu 其他的百分比"
unit: "%"
job: "node-exporter"
############################################################################################## ##############################################################################################
# memory #
- expr: node_memory_MemTotal_bytes{job="node-exporter"}
record: node_exporter:memory:total
labels:
desc: "节点的内存总量"
unit: byte
job: "node-exporter" - expr: node_memory_MemFree_bytes{job="node-exporter"}
record: node_exporter:memory:free
labels:
desc: "节点的剩余内存量"
unit: byte
job: "node-exporter" - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemFree_bytes{job="node-exporter"}
record: node_exporter:memory:used
labels:
desc: "节点的已使用内存量"
unit: byte
job: "node-exporter" - expr: node_memory_MemTotal_bytes{job="node-exporter"} - node_memory_MemAvailable_bytes{job="node-exporter"}
record: node_exporter:memory:actualused
labels:
desc: "节点用户实际使用的内存量"
unit: byte
job: "node-exporter" - expr: (1-(node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
record: node_exporter:memory:used:percent
labels:
desc: "节点的内存使用百分比"
unit: "%"
job: "node-exporter" - expr: ((node_memory_MemAvailable_bytes{job="node-exporter"} / (node_memory_MemTotal_bytes{job="node-exporter"})))* 100
record: node_exporter:memory:free:percent
labels:
desc: "节点的内存剩余百分比"
unit: "%"
job: "node-exporter"
##############################################################################################
# load #
- expr: sum by (instance) (node_load1{job="node-exporter"})
record: node_exporter:load:load1
labels:
desc: "系统1分钟负载"
unit: " "
job: "node-exporter" - expr: sum by (instance) (node_load5{job="node-exporter"})
record: node_exporter:load:load5
labels:
desc: "系统5分钟负载"
unit: " "
job: "node-exporter" - expr: sum by (instance) (node_load15{job="node-exporter"})
record: node_exporter:load:load15
labels:
desc: "系统15分钟负载"
unit: " "
job: "node-exporter" ##############################################################################################
# disk #
- expr: node_filesystem_size_bytes{job="node-exporter" ,fstype=~"ext4|xfs"}
record: node_exporter:disk:usage:total
labels:
desc: "节点的磁盘总量"
unit: byte
job: "node-exporter" - expr: node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
record: node_exporter:disk:usage:free
labels:
desc: "节点的磁盘剩余空间"
unit: byte
job: "node-exporter" - expr: node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"}
record: node_exporter:disk:usage:used
labels:
desc: "节点的磁盘使用的空间"
unit: byte
job: "node-exporter" - expr: (1 - node_filesystem_avail_bytes{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job="node-exporter",fstype=~"ext4|xfs"}) * 100
record: node_exporter:disk:used:percent
labels:
desc: "节点的磁盘的使用百分比"
unit: "%"
job: "node-exporter" - expr: irate(node_disk_reads_completed_total{job="node-exporter"}[1m])
record: node_exporter:disk:read:count:rate
labels:
desc: "节点的磁盘读取速率"
unit: "次/秒"
job: "node-exporter" - expr: irate(node_disk_writes_completed_total{job="node-exporter"}[1m])
record: node_exporter:disk:write:count:rate
labels:
desc: "节点的磁盘写入速率"
unit: "次/秒"
job: "node-exporter" - expr: (irate(node_disk_written_bytes_total{job="node-exporter"}[1m]))/1024/1024
record: node_exporter:disk:read:mb:rate
labels:
desc: "节点的设备读取MB速率"
unit: "MB/s"
job: "node-exporter" - expr: (irate(node_disk_read_bytes_total{job="node-exporter"}[1m]))/1024/1024
record: node_exporter:disk:write:mb:rate
labels:
desc: "节点的设备写入MB速率"
unit: "MB/s"
job: "node-exporter" ##############################################################################################
# filesystem #
- expr: (1 -node_filesystem_files_free{job="node-exporter",fstype=~"ext4|xfs"} / node_filesystem_files{job="node-exporter",fstype=~"ext4|xfs"}) * 100
record: node_exporter:filesystem:used:percent
labels:
desc: "节点的inode的剩余可用的百分比"
unit: "%"
job: "node-exporter"
#############################################################################################
# filefd #
- expr: node_filefd_allocated{job="node-exporter"}
record: node_exporter:filefd_allocated:count
labels:
desc: "节点的文件描述符打开个数"
unit: "%"
job: "node-exporter" - expr: node_filefd_allocated{job="node-exporter"}/node_filefd_maximum{job="node-exporter"} * 100
record: node_exporter:filefd_allocated:percent
labels:
desc: "节点的文件描述符打开百分比"
unit: "%"
job: "node-exporter" #############################################################################################
# network #
- expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netin:bit:rate
labels:
desc: "节点网卡eth0每秒接收的比特数"
unit: "bit/s"
job: "node-exporter" - expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netout:bit:rate
labels:
desc: "节点网卡eth0每秒发送的比特数"
unit: "bit/s"
job: "node-exporter" - expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netin:packet:rate
labels:
desc: "节点网卡每秒接收的数据包个数"
unit: "个/秒"
job: "node-exporter" - expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netout:packet:rate
labels:
desc: "节点网卡发送的数据包个数"
unit: "个/秒"
job: "node-exporter" - expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netin:error:rate
labels:
desc: "节点设备驱动器检测到的接收错误包的数量"
unit: "个/秒"
job: "node-exporter" - expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
record: node_exporter:network:netout:error:rate
labels:
desc: "节点设备驱动器检测到的发送错误包的数量"
unit: "个/秒"
job: "node-exporter" - expr: node_tcp_connection_states{job="node-exporter", state="established"}
record: node_exporter:network:tcp:established:count
labels:
desc: "节点当前established的个数"
unit: "个"
job: "node-exporter" - expr: node_tcp_connection_states{job="node-exporter", state="time_wait"}
record: node_exporter:network:tcp:timewait:count
labels:
desc: "节点timewait的连接数"
unit: "个"
job: "node-exporter" - expr: sum by (environment,instance) (node_tcp_connection_states{job="node-exporter"})
record: node_exporter:network:tcp:total:count
labels:
desc: "节点tcp连接总数"
unit: "个"
job: "node-exporter" #############################################################################################
# process #
- expr: node_processes_state{state="Z"}
record: node_exporter:process:zoom:total:count
labels:
desc: "节点当前状态为zoom的个数"
unit: "个"
job: "node-exporter"
#############################################################################################
# other #
- expr: abs(node_timex_offset_seconds{job="node-exporter"})
record: node_exporter:time:offset
labels:
desc: "节点的时间偏差"
unit: "s"
job: "node-exporter" ############################################################################################# - expr: count by (instance) ( count by (instance,cpu) (node_cpu_seconds_total{ mode='system'}) )
record: node_exporter:cpu:count
#

node-exporter-alert-rules.yml

groups:
- name: node-exporter-alert
rules:
- alert: node-exporter-down
expr: node_exporter:up == 0
for: 1m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 宕机了"
description: "instance: {{ $labels.instance }} \n- job: {{ $labels.job }} 关机了, 时间已经1分钟了。"
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-cpu-high
expr: node_exporter:cpu:total:percent > 80
for: 3m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} cpu 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-cpu-iowait-high
expr: node_exporter:cpu:iowait:percent >= 12
for: 3m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} cpu iowait 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-load-load1-high
expr: (node_exporter:load:load1) > (node_exporter:cpu:count) * 1.2
for: 3m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} load1 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-memory-high
expr: node_exporter:memory:used:percent > 85
for: 3m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} memory 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-disk-high
expr: node_exporter:disk:used:percent > 88
for: 10m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} disk 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-disk-read:count-high
expr: node_exporter:disk:read:count:rate > 3000
for: 2m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} iops read 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-disk-write-count-high
expr: node_exporter:disk:write:count:rate > 3000
for: 2m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} iops write 使用率高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-disk-read-mb-high
expr: node_exporter:disk:read:mb:rate > 60
for: 2m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 读取字节数 高于 {{ $value }}"
description: ""
instance: "{{ $labels.instance }}"
value: "{{ $value }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-disk-write-mb-high
expr: node_exporter:disk:write:mb:rate > 60
for: 2m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 写入字节数 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-filefd-allocated-percent-high
expr: node_exporter:filefd_allocated:percent > 80
for: 10m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 打开文件描述符 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-network-netin-error-rate-high
expr: node_exporter:network:netin:error:rate > 4
for: 1m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 包进入的错误速率 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info"
- alert: node-exporter-network-netin-packet-rate-high
expr: node_exporter:network:netin:packet:rate > 35000
for: 1m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 包进入速率 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-network-netout-packet-rate-high
expr: node_exporter:network:netout:packet:rate > 35000
for: 1m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 包流出速率 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-network-tcp-total-count-high
expr: node_exporter:network:tcp:total:count > 40000
for: 1m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} tcp连接数量 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-process-zoom-total-count-high
expr: node_exporter:process:zoom:total:count > 10
for: 10m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} 僵死进程数量 高于 {{ $value }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info" - alert: node-exporter-time-offset-high
expr: node_exporter:time:offset > 0.03
for: 2m
labels:
severity: info
annotations:
summary: "instance: {{ $labels.instance }} {{ $labels.desc }} {{ $value }} {{ $labels.unit }}"
description: ""
value: "{{ $value }}"
instance: "{{ $labels.instance }}"
grafana: "http://xxxxxxxx.com/d/node-exporter/node-exporter?orgId=1&var-instance={{ $labels.instance }} "
console: "https://ecs.console.aliyun.com/#/server/{{ $labels.instanceid }}/detail?regionId=cn-beijing"
cloudmonitor: "https://cloudmonitor.console.aliyun.com/#/hostDetail/chart/instanceId={{ $labels.instanceid }}&system=&region=cn-beijing&aliyunhost=true"
id: "{{ $labels.instanceid }}"
type: "aliyun_meta_ecs_info"

准备这2个文件放置到/usr/local/prometheus/prometheus/rules文件夹里面,确保prometheus的主配置文件有如下部分:

rule_files:
- "rules/*rules.yml"
# - "second_rules.yml"

重启prometheus服务, 可以在web界面看到对应的规则。

可以直接在表达式浏览器中输入我们定义好的记录规则表达式了,如下。

其他

网上对prometheus的规则相对较少, 这里提供一个地址,可以参考参考: https://awesome-prometheus-alerts.grep.to/rules

prometheus学习系列十一: Prometheus 报警规则配置的更多相关文章

  1. prometheus学习系列十一: Prometheus 安全

    prometheus安全 我们这里说的安全主要是基本认证和https2种, 目前这2种安全在prometheus中都没有的, 需要借助第三方软件实现, 这里以nginx为例. 基本认证 配置基本认证 ...

  2. prometheus学习系列十一: Prometheus pushgateway的使用

    由于网络问题或者安全问题,可能我们的数据无法直接暴露出一个entrypoint 给prometheus采集. 这个时候可能就需要一个pushgateway来作为中间者完成中转工作.  promethe ...

  3. prometheus学习系列十一: Prometheus exporter详解

    exporter详解 前面的系列中,我们在主机上面安装了node_exporter程序,该程序对外暴露一个用于获取当前监控样本数据的http的访问地址, 这个的一个程序成为exporter,Expor ...

  4. prometheus学习系列十一: Prometheus和AlertManager的高可用

    前面的系列中, prometheus和alertmanager都是单机部署的,会有单机宕机导致系统不可用情况发生.本文主要介绍下prometheus和alertmanager的高可用方案. 服务的高可 ...

  5. prometheus学习系列十一: Prometheus 采集器的编写

    在前面的文章已经写了官方的几个exporter的使用了. 在实际使用环境中,我们可能需要收集一些自定义的数据, 这个时候我们一般是需要自己编写采集器的. 快速入门编写一个入门的demo 编写代码 fr ...

  6. Prometheus学习系列(五)之Prometheus 规则(rule)、模板配置说明

    前言 本文来自Prometheus官网手册1.2.3.4和 Prometheus简介1.2.3.4 记录规则 一.配置规则 Prometheus支持两种类型的规则,这些规则可以定期配置,然后定期评估: ...

  7. Prometheus学习系列(六)之Prometheus 查询说明

    前言 本文来自Prometheus官网手册和 Prometheus简介 Prothetheus查询 Prometheus提供一个函数式的表达式语言PromQL (Prometheus Query La ...

  8. Prometheus学习系列(二)之Prometheus FIRST STEPS

    前言 本文来自Prometheus官网手册 和 Prometheus简介 说明 Prometheus是一个监控平台,通过在监控目标上的HTTP端点来收集受监控目标的指标.本指南将向您展示如何使用Pro ...

  9. prometheus学习系列十: Prometheus AlertManager配置文件说明

    alertmanager配置文件说明 alertmanager是通过命令行标记和配置文件配置的,命令行标记配置不可变的系统参数,配置文件定义抑制规则.通知路由和通知接收器.可以通过官方提供的routi ...

随机推荐

  1. 3.shell编程-文件查找之find命令

    3.1.语法格式 find [路劲][选项][操作] 选项参数对照表 3.2.-name 查找/etc/目录下以.conf结尾的文件 find /etc/ -name "*.conf&quo ...

  2. Salesforce 开发整理(二)报表开发学习

    Salesforce提供了强大的报表功能,支持表格.摘要.矩阵以及结合共四种形式,本文探讨在站在开发的角度要如何理解报表. 一:查询报表基本信息报表在Sales force中是Report对象,基本的 ...

  3. QHBoxLayout 、QFormLayout 遍历子部件,查找QLineEdit控件

    布局如下: QLineEdit * edit1 = new QLineEdit; QLineEdit * edit2 = new QLineEdit; QLineEdit * edit3 = new ...

  4. Docker整合dockerfly实现UI界面管理(单机版)

    一.搜索镜像 docker search dockerfly 二.根据镜像使用排名(一般情况下拉取使用率最高的镜像名),我这里使用的是阿里云镜像地址 docker pull registry.cn-h ...

  5. MPI linux Ubuntu cluster 集群

    在局域网内安装mpi,并进行并行计算.MPICH3. 下载源码: wget http://www.mpich.org/static/downloads/3.3.1/mpich-3.3.1.tar.gz ...

  6. PatchMatchStereo可能会需要的Rectification

    在稠密三维重建中,rectification可以简化patch match的过程.在双目特征匹配等场景中其实也用得到,看了一下一篇论文叫< A Compact Algorithm for Rec ...

  7. .net中加密与解密

    .Net中的加密解密 引言 在一些比较重要的应用场景中,通过网络传递数据需要进行加密以保证安全.本文将简单地介绍了加密解密的一些概念,以及相关的数字签名.证书,最后介绍了如何在.NET中对数据进行对称 ...

  8. [转帖]来聊聊,华为与H3C(华三)的前世今生!

    本篇,是以真实事件改编,将以故事篇的方式呈现出来. 本故事将分为两个篇幅讲述. 在中国的网络通信设备市场,有两个华字辈的选手,一名叫“华为技术有限公司”,另一名叫“杭州华三通信技术有限公司”. 这两个 ...

  9. 2019年湖南省大学生计算机程序设计竞赛 (HNCPC2019) 简要题解

    2019年湖南省大学生计算机程序设计竞赛 (HNCPC2019) 简要题解 update10.01 突然发现叉姐把这场的题传到牛客上了,现在大家可以有地方提交了呢. 不知道该干什么所以就来水一篇题解 ...

  10. 【题解】选数字 [51nod1354]

    [题解]选数字 [51nod1354] 传送门:选数字 \([51nod1354]\) [题目描述] 共 \(T\) 组测试点,每一组给定一个长度为 \(n\) 的序列和一个整数 \(K\),找出有多 ...