Prometheus K8S中部署Alertmanager
Prometheus K8S中部署Alertmanager

设置告警和通知的主要步骤如下:
一、部署Alertmanager
二、配置Prometheus与Alertmanager通信
三、配置告警
1. prometheus指定rules目录
2. configmap存储告警规则
3. configmap挂载到容器rules目录
一、部署Alertmanager
配置文件
已经修改好的配置文件
- # 存储主配置文件
- alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
# 配置文件名称
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
# 告警自定义邮件
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'baojingtongzhi@163.com'
smtp_auth_username: 'baojingtongzhi@163.com'
smtp_auth_password: 'liang123' receivers:
- name: default-receiver
email_configs:
- to: "zhenliang369@163.com" route:
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m配置文件
- # 部署核心组件
- alertmanager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: kube-system
labels:
k8s-app: alertmanager
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.14.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: alertmanager
version: v0.14.0
template:
metadata:
labels:
k8s-app: alertmanager
version: v0.14.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
containers:
- name: prometheus-alertmanager
image: "prom/alertmanager:v0.14.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/alertmanager.yml
- --storage.path=/data
- --web.external-url=/
ports:
- containerPort: 9093
readinessProbe:
httpGet:
path: /#/status
port: 9093
initialDelaySeconds: 30
timeoutSeconds: 30
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: storage-volume
mountPath: "/data"
subPath: ""
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
- name: prometheus-alertmanager-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9093/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
volumes:
- name: config-volume
configMap:
name: alertmanager-config
- name: storage-volume
persistentVolumeClaim:
claimName: alertmanager配置文件
- # 使用的自动PV存储
- alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# 使用自己的动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"配置文件
- # 暴露Prot端口
- alertmanager-service.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# 使用自己的动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
[root@Master1 ~]# cat alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Alertmanager"
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 9093
selector:
k8s-app: alertmanager
type: "ClusterIP"配置文件
部署
1、创建pvc、configmap、deployment、service
kubectl apply -f alertmanager-pvc.yaml
kubectl create -f alertmanager-configmap.yaml
kubectl apply -f alertmanager-deployment.yaml
kubectl apply -f alertmanager-service.yaml
2、查看Pod状态
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
alertmanager-5bb796cb48-fwztv 2/2 Running 0 2m29s
3、查看service状态
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager ClusterIP 10.0.0.126 <none> 80/TCP
二、配置Prometheus与Alertmanager通信
1、编辑 prometheus-configmap.yaml 配置文件添加绑定信息
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
# 存放prometheus配置文件
prometheus.yml: |
# 配置采集目标
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
# 采集自身
- localhost:9090 prometheus.yml: |
# 配置采集目标
scrape_configs:
- job_name: kubernetes-nodes
static_configs:
- targets:
# 采集自身
- 192.168.1.110:9100
- 192.168.1.111:9100 # 采集:Apiserver 生存指标
# 创建的job name 名称为 kubernetes-apiservers
- job_name: kubernetes-apiservers
# 基于k8s的服务发现
kubernetes_sd_configs:
- role: endpoints
# 使用通信标记标签
relabel_configs:
# 保留正则匹配标签
- action: keep
# 已经包含
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
# 使用方法为https、默认http
scheme: https
tls_config:
# promethus访问Apiserver使用认证
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# 跳过https认证
insecure_skip_verify: true
# promethus访问Apiserver使用认证
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:Kubelet 生存指标
- job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
# 发现集群中所有的Node
- role: node
relabel_configs:
# 通过regex获取关键信息
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:nodes-cadvisor 信息
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 重命名标签
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 采集:service-endpoints 信息
- job_name: kubernetes-service-endpoints
# 选定指标
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
# 指定源标签
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
# 重命名标签采集
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name # 采集:kubernetes-services 服务指标
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
# 黑盒探测,探测IP与端口是否可用
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
# 使用 blackbox进行黑盒探测
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name # 采集: kubernetes-pods 信息
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
# 只保留采集的信息
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
# 采集地址
- __address__
# 采集端口
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
alerting:
# 告警配置文件
alertmanagers:
# 修改:使用静态绑定
- static_configs:
# 修改:targets、指定地址与端口
- targets: ["alertmanager:80"]
配置文件
2、应用加载配置文件
kubectl apply -f prometheus-configmap.yaml
3、web控制台查看配置是否生效
http://192.168.1.110:42575/config
三、配置告警
1. prometheus指定rules目录
1、编辑 prometheus-configmap.yaml 添加报警信息
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
# 添加:指定读取rules配置
rules_files:
- /etc/config/rules/*.rules
......
配置文件
2、生效配置文件
kubectl apply -f prometheus-configmap.yaml
2. configmap存储告警规则
1、创建yaml文件同过configmap存储告警规则
vim prometheus-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: kube-system
data:
# 通用角色
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
# Node对所有资源的监控
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})" - alert: NodeMemoryUsage
expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 内存使用率过高"
description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})" - alert: NodeCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} CPU使用率过高"
description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
配置文件
3. configmap挂载到容器rules目录
1、修改挂载点位置,使用之前部署的prometheus动态PV
vim prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
# 部署命名空间
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
# 初始化容器
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi - name: prometheus-server
# 主要使用镜像
image: "prom/prometheus:v2.2.1"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
# 健康检查
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
# 数据卷
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
# 添加:指定rules的configmap配置文件名称
- name: prometheus-rules
mountPath: /etc/config/rules
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
# 添加:name rules
- name: prometheus-rules
# 添加:配置文件
configMap:
# 添加:定义文件名称
name: prometheus-rules volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
# 使用动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
配置文件
2、创建configmap并更新PV
kubectl apply -f prometheus-rules.yaml
kubectl apply -f prometheus-statefulset.yaml
3、查看Pod
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
prometheus-0 1/2 Running 0 42s
Prometheus K8S中部署Alertmanager的更多相关文章
- 不使用pvc的方式在K8S中部署apisix-gateway
不使用pvc的方式在K8S中部署apisix-gateway 简介 我的apisix使用etcd作为数据存储服务器,官方的使用pvc方式或者docker-compose的方式,对于新手不太友好,本篇是 ...
- K8S中部署apisix(非ingress)
不使用pvc的方式在K8S中部署apisix-gateway 简介 因为公司项目准备重构,现在做技术储备,之前公司项目使用的ocelot做网关,ocelot是.net平台下的一个网关,也是很不错,但是 ...
- Kubernetes之在k8s中部署Java应用
部署好了k8s以后 部署参考https://www.cnblogs.com/minseo/p/12055731.html 怎么在k8s部署应用 项目迁移到k8s平台是怎样的流程 1,制作镜像 2,控制 ...
- Docker & k8s 系列三:在k8s中部署单个服务实例
本章将会讲解: pod的概念,以及如何向k8s中部署一个单体应用实例. 在上面的篇幅中,我们了解了docker,并制作.运行了docker镜像,然后将镜像发布至中央仓库了.然后又搭建了本机的k8s环境 ...
- k8s中部署springcloud
安装和配置数据存储仓库MySQL 1.MySQL简介 2.MySQL特点 3.安装和配置MySQL 4.在MySQL数据库导入数据 5.对MySQL数据库进行授权 1.MySQL简介 MySQL 是一 ...
- 在k8s中部署前后端分离项目进行访问的两种配置方式
第一种方式 (1) nginx配置中只写前端项目的/根路径配置 前端项目使用的Dockerfile文件内容 把前端项目编译后生成的dist文件夹放在nginx的html默认目录下,浏览器访问前端项目时 ...
- 【转】K8S中部署Helm
K8S中的包管理工具 1. 客户端Helm(即Helm) 通过脚本安装:curl https://raw.githubusercontent.com/helm/helm/master/scripts ...
- k8s集群中部署prometheus server
1.概述 本文档主要介绍如何在k8s集群中部署prometheus server用来作为监控的数据采集服务器,这样做可以很方便的对k8s集群中的指标.pod的.节点的指标进行采集和监控. 2.下载镜像 ...
- Prometheus K8S部署
Prometheus K8S部署 部署方式:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/prometheus ...
随机推荐
- powershell 提取 spotlight 图片
powershell脚本来源于网络,有一些调整. # 将复制出来的缓存图片保存在下面的文件夹 $dir = Split-Path -Parent $MyInvocation.MyCommand.Def ...
- U8 BOM数据结构
U8 BOM涉及的数据表有四张 bom_bom BOM资料 该表主要记录BOM表的一些基本信息,版本.创建.审核等信息,不包括任何子件.母件信息:bomid是BOM主键 bom_parent ...
- Vue中的组件及路由使用
1.组件是什么 组件系统是 Vue 的一个重要概念,因为它是一种抽象,允许我们使用小型.独立和通常可复用的组件构建大型应用.通常一个应用会以一棵嵌套的组件树的形式来组织: 1.1组件的声 ...
- C# virtual 和 abstract 区别
Virtual方法(虚方法) virtual 关键字用于在基类中修饰方法.virtual的使用会有两种情况: 情况1:在基类中定义了virtual方法,但在派生类中没有重写该虚方法.那么在对派生类实例 ...
- JS基础语法---函数练习part2---10个综合练习(运用:循环/数组/函数)
练习1:求2个数中的最大值 function getMax(num1, num2) { return num1 > num2 ? num1 : num2; } console.log(getMa ...
- E203数据冲突处理OITF
流水线的数据冲突分为三类:WAR,RAW,WAW https://wenku.baidu.com/view/e066926d48d7c1c708a14508.html WAR: write after ...
- 查找字段的筛选-使用addCustomView
关注本人微信和易信公众号: 微软动态CRM专家罗勇 ,回复231或者20161031可方便获取本文,同时可以在第一间得到我发布的最新的博文信息,follow me!我的网站是 www.luoyong. ...
- Mysql Join-连接查询(中)
Mysql Join-连接查询(中) 认识 就我平时的数据接触来看, 连接查询也没有很复杂,不够是非常需要耐心和逻辑的, 一点点将数据查出来, 拼接等. 没有什么技巧, 多练习就会了. 无非就是表之间 ...
- java实现序列化的两种方式
1.Serializable接口 2.Externalizable接口 public class Demo2 implements Externalizable{ transient private ...
- layui获取checkbox复选框值
获取layui表单复选框已选中的数据 HTML <!DOCTYPE html> <html> <head> <meta charset="utf-8 ...