cAdvisor容器监控规则

其他说明参考host主机监控规则：https://www.cnblogs.com/sanduzxcvbnm/p/13589848.html

在prometheus主程序目录下的rules目录下新建docker.yml文件，添加上如下内容，然后重启prometheus。

groups:

- name:  Docker containers monitoring

  rules:

  - alert: ContainerKilled

    expr: time() - container_last_seen > 60

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container killed (instance {{ $labels.instance }})"

      description: "A container has disappeared\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerCpuUsage

    expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container CPU usage (instance {{ $labels.instance }})"

      description: "Container CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerMemoryUsage

    expr: (sum(container_memory_usage_bytes) BY (instance, name) / sum(container_spec_memory_limit_bytes) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Memory usage (instance {{ $labels.instance }})"

      description: "Container Memory usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerVolumeUsage

    expr: (1 - (sum(container_fs_inodes_free) BY (instance) / sum(container_fs_inodes_total) BY (instance)) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Volume usage (instance {{ $labels.instance }})"

      description: "Container Volume usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerVolumeIoUsage

    expr: (sum(container_fs_io_current) BY (instance, name) * 100) > 80

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container Volume IO usage (instance {{ $labels.instance }})"

      description: "Container Volume IO usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ContainerHighThrottleRate

    expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Container high throttle rate (instance {{ $labels.instance }})"

      description: "Container is being throttled\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerActiveConnectinos

    expr: pgbouncer_pools_server_active_connections > 200

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "PGBouncer active connectinos (instance {{ $labels.instance }})"

      description: "PGBouncer pools are filling up\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerErrors

    expr: increase(pgbouncer_errors_count{errmsg!="server conn crashed?"}[5m]) > 10

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "PGBouncer errors (instance {{ $labels.instance }})"

      description: "PGBouncer is logging errors. This may be due to a a server restart or an admin typing commands at the pgbouncer console.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: PgbouncerMaxConnections

    expr: rate(pgbouncer_errors_count{errmsg="no more connections allowed (max_client_conn)"}[1m]) > 0

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "PGBouncer max connections (instance {{ $labels.instance }})"

      description: "The number of PGBouncer client connections has reached max_client_conn.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: SidekiqQueueSize

    expr: sidekiq_queue_size{} > 100

    for: 5m

    labels:

      severity: warning

    annotations:

      summary: "Sidekiq queue size (instance {{ $labels.instance }})"

      description: "Sidekiq queue {{ $labels.name }} is growing\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: SidekiqSchedulingLatencyTooHigh

    expr: max(sidekiq_queue_latency) > 120

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Sidekiq scheduling latency too high (instance {{ $labels.instance }})"

      description: "Sidekiq jobs are taking more than 2 minutes to be picked up. Users may be seeing delays in background processing.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulServiceHealthcheckFailed

    expr: consul_catalog_service_node_healthy == 0

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul service healthcheck failed (instance {{ $labels.instance }})"

      description: "Service: `{{ $labels.service_name }}` Healthcheck: `{{ $labels.service_id }}`\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulMissingMasterNode

    expr: consul_raft_peers < 3

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul missing master node (instance {{ $labels.instance }})"

      description: "Numbers of consul raft peers should be 3, in order to preserve quorum.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

  - alert: ConsulAgentUnhealthy

    expr: consul_health_node_status{status="critical"} == 1

    for: 5m

    labels:

      severity: critical

    annotations:

      summary: "Consul agent unhealthy (instance {{ $labels.instance }})"

      description: "A Consul agent is down\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}"

cAdvisor容器监控规则的更多相关文章

容器监控：cadvisor+influxdb+grafana
cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机/proc./sys./var/lib/docker等目录下文件获取宿 ...
docker stack 部署容器监控方案（cAdvisor、Prometheus、Grafana）
=============================================== 2018/7/8_第1次修改 ccb_warlock === ...
Docker进阶-容器监控cAdvisor+InfluxDB+Granfana
概述前面文章介绍使用docker compose组合应用并利用scale快速对容器进行扩容. 由于docker compose启动的服务都在同一台宿主机上,对于一个宿主机上运行多个容器应用时,容器的 ...
你必须知道的容器监控 (2) cAdvisor
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇我们了解了docker自带的监控子命令以及开源监控工具Weave Scop ...
docker容器监控：cadvisor+influxdb+grafana
cadvisor+influxdb+grafana可以实现容器信息获取.存储.显示等容器监控功能,是目前流行的docker监控开源方案. 方案介绍 cadvisor Google开源的用于监控基础设施 ...
容器监控：cAdvisor
CAdvisor是Google开源的一款用于展示和分析容器运行状态的可视化工具.通过在主机上运行CAdvisor用户可以轻松的获取到当前主机上容器的运行统计信息,并以图表的形式向用户展示. 在本地运行 ...
【容器云】十分钟快速构建 Influxdb+cadvisor+grafana 监控
本文作者:七牛云布道师@陈爱珍,DBAPlus社群联合发起人.前新炬技术专家.多年企业级系统的应用运维及分布式系统实战经验.现专注于容器.微服务及DevOps落地的研究与实践. 安装过程三个都直接下 ...
你必须知道的容器监控 (3) Prometheus
本篇已加入<.NET Core on K8S学习实践系列文章索引>,可以点击查看更多容器化技术相关系列文章.上一篇介绍了Google开发的容器监控工具cAdvisor,但是其提供的操作界面 ...
Docker系列08：容器监控
1 监控解决方案 cadvisor+influxdb+grafana cAdvisor:Google开源的工具,用于监控Docker主机和容器系统资源,通过图形页面实时显示数据,但不存储:它通过宿主机 ...

随机推荐

NGINX屏蔽垃圾爬虫
if ($http_user_agent ~* (80legs.com|Abonti|AcoonBot|Acunetix|adbeat_bot|AddThis.com|adidxbot|ADmantX ...
vivado没用上的寄存器变量
vivado中定义了但没用上的寄存器变量,在综合时会被移除,即没有综合出来.(如下cnt,虽然在y的过程块中用了cnt作为判断条件,但实际上cnt用了跟没用效果一样,所以综合时cnt_reg就被放弃了 ...
UVA195 Anagram 题解
To 题目主要思路:全排列 + 亿点点小技巧. 不会全排列的可以先把这道题过了 $P1706$. 这道题的难点就在于有重复的单词,只记一次. 第一个想法是将所有以生成的单词记录下来,然后每次判断 ...
maven的常见问题
idea2021.3报错-Maven-Terminated-with-exit-code-1
Thread类的常用方法_获取线程名称的方法和Thread类的常用方法_设置线程名称的方法
构造方法: public Thread();分配一个新的线程对象 public Thread(String name);分配一个指定名字的新的线程对象 public Thread(Runnable t ...
LyScript 内存交换与差异对比
LyScript 针对内存读写函数的封装功能并不多,只提供了内存读取和内存写入函数的封装,本篇文章将继续对API进行封装,实现一些在软件逆向分析中非常实用的功能,例如内存交换,内存区域对比,磁盘与内存 ...
先导，对IOC容器的理解
先导,对IOC容器的理解通俗的讲就是把你的class类交给spring的IOC容器去管理需要对该类的属性注入一些值,就可以通过spring提供的xml文件或者注解进行注入自己使用时在IOC容器工 ...
MySQL表操作过程的基础代码解析
GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源. MySQL 的表有很多种,对表的操作主要是增删改查,今天来浅谈一下这些操作的底层代码和流程,以下以 tmp table为例 ...
MySQL Shell无法拉起MGR集群解决办法
MySQL Shell无法拉起MGR集群解决办法用MySQL Shell要重新拉起一个MGR集群时,可能会提示下面的错误信息: Dba.rebootClusterFromCompleteOutage ...
查找默认安装的python路径，并输出到 FindPythonPathX_output.txt
在python程序设计教学中,在汉化IDEL时.为PyCharm项目设置解释器时,经常需要查找python安装路径.对老手来说很简单,但对很多刚开始学习编程的学生来说,则很困难.所以,编写了一个批处理 ...

cAdvisor容器监控规则

cAdvisor容器监控规则的更多相关文章

随机推荐

热门专题