容器编排系统K8s之Prometheus监控系统+Grafana部署
前文我们聊到了k8s的apiservice资源结合自定义apiserver扩展原生apiserver功能的相关话题,回顾请参考:https://www.cnblogs.com/qiuhom-1874/p/14279850.html;今天我们来聊一聊监控k8s集群相关话题;
前文我们使用自定义apiserver metrics server扩展了原生apiserver的功能,让其原生apiserver能够通过kubectl top node/pod 命令来获取对应节点或名称空间下pod的cpu和内存指标数据;这些指标数据在一定程度上能够让我们清楚的知道对应pod或节点资源使用情况,本质上这也是一种监控方式;但是metrics server 采集的数据只有内存和cpu指标数据,在一定程度上不能满足我们了解节点或pod的其他数据;这样一来我们就需要有一款专业的监控系统来帮助我们监控k8s集群节点或pod;Prometheus是一款高性能的监控程序,其内部主要有3个组件,Retrieval组件主要负责数据收集工作,它可以结合外部其他程序收集数据;TSDB组件主要是用来存储指标数据,该组件是一个时间序列存储系统;HttpServer组件主要用来对外提供restful api接口,为客户端提供查询接口;默认监听在9090端口;
prometheus监控系统整体top
提示:上图是Prometheus监控系统的top图;Pushgateway组件类似Prometheus retrieval代理,它主要负责收集主动推送指标数据的pod的指标数据,在Prometheus 监控系统中也有主动监控和被动监控的概念,主动监控是指被监控端主动推送数据到server,被动监控是指被监控端被动等待server来拉去数据,默认情况Prometheus是工作为被动监控模式,即server主动到被监控端采集数据;节点级别metrics 数据可以使用node-exporter来收集,当然node-exporter也可以收集pod容器里的指标数据;alertmanager主要用来为Prometheus监控系统提供告警功能;Prometheus web ui主要作用是为其提供一个web查询页面;
Prometheus 监控系统组件
kube-state-metrics:该组件主要用来为监控k8s集群中的指标数据提供计数能力;比如k8s节点有几个,pod的数量等等;
node-exporter:该组件主要作用是用来收集对应节点上的指标数据;
alertmanager:该组件主要用来为Prometheus监控系统提供告警功能;
prometheus-server:该组件主要用来存储指标数据,处理指标数据,以及为用户提供一个restful api查询接口;
控制pod能够被Prometheus抓取数据的注解信息
prometheus.io/scrape:该注解信息主要用来描述对应pod是否允许抓取指标数据,true表示允许,false表示不允许;
prometheus.io/path:用于描述抓取指标数据使用的url路径,一般为/metrics
prometheus.io/port:用于描述对应抓取指标数据使用的端口信息;
部署Prometheus监控系统
1、部署kube-state-metrics
创建kube-state-metrics rbac授权相关清单
[root@master01 kube-state-metrics]# cat kube-state-metrics-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["extensions","apps"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics-resizer
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["extensions","apps"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
[root@master01 kube-state-metrics]#
提示:上述清单主要创建了一个sa用户,和两个角色,并将sa用户绑定之对应的角色上;让其对应sa用户拥有对应角色的相关权限;
创建kube-state-metrics service配置清单
[root@master01 kube-state-metrics]# cat kube-state-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "kube-state-metrics"
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
[root@master01 kube-state-metrics]#
创建kube-state-metrics 部署清单
[root@master01 kube-state-metrics]# cat kube-state-metrics-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.0.0-beta
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
version: v2.0.0-beta
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
version: v2.0.0-beta
spec:
priorityClassName: system-cluster-critical
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v2.0.0-beta
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: k8s.gcr.io/addon-resizer:1.8.7
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 30Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: config-volume
mountPath: /etc/config
command:
- /pod_nanny
- --config-dir=/etc/config
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
volumes:
- name: config-volume
configMap:
name: kube-state-metrics-config
---
# Config map for resource configuration.
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-state-metrics-config
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration [root@master01 kube-state-metrics]#
应用上述三个清单,部署kube-state-metrics组件
[root@master01 kube-state-metrics]# ls
kube-state-metrics-deployment.yaml kube-state-metrics-rbac.yaml kube-state-metrics-service.yaml
[root@master01 kube-state-metrics]# kubectl apply -f .
deployment.apps/kube-state-metrics created
configmap/kube-state-metrics-config created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics-resizer created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
[root@master01 kube-state-metrics]#
验证:查看对应的pod和service是否都成功创建?
提示:可以看到对应pod和svc都已经正常创建;
验证:访问对应service的8080端口,url为/metrics,看看是否能够访问到数据?
提示:可以看到访问对应service的8080端口,url为/metrics能够访问到对应数据,说明kube-state-metrics组件安装部署完成;
2、部署node-exporter
创建node-export service配置清单
[root@master01 node_exporter]# cat node-exporter-service.yaml
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: kube-system
annotations:
prometheus.io/scrape: "true"
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "NodeExporter"
spec:
clusterIP: None
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
selector:
k8s-app: node-exporter
[root@master01 node_exporter]#
创建node-export 部署清单
[root@master01 node_exporter]# cat node-exporter-ds.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v1.0.1
spec:
selector:
matchLabels:
k8s-app: node-exporter
version: v1.0.1
updateStrategy:
type: OnDelete
template:
metadata:
labels:
k8s-app: node-exporter
version: v1.0.1
spec:
priorityClassName: system-node-critical
containers:
- name: prometheus-node-exporter
image: "prom/node-exporter:v1.0.1"
imagePullPolicy: "IfNotPresent"
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
ports:
- name: metrics
containerPort: 9100
hostPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
resources:
limits:
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
hostNetwork: true
hostPID: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule [root@master01 node_exporter]#
提示:上述清单主要用daemonSet控制器来运行node-exporter pod,并在对应pod上做了共享宿主机网络名称空间和pid,以及对主节点污点的容忍度;这样node-exporter就可以在k8s的所有节点上运行一个pod,通过对应pod来采集对应节点上的指标数据;
应用上述两个配置清单部署 node-exporter
[root@master01 node_exporter]# ls
node-exporter-ds.yml node-exporter-service.yaml
[root@master01 node_exporter]# kubectl apply -f .
daemonset.apps/node-exporter created
service/node-exporter created
[root@master01 node_exporter]#
验证:查看对应pod和svc是否正常创建?
[root@master01 node_exporter]# kubectl get pods -l "k8s-app=node-exporter" -n kube-system
NAME READY STATUS RESTARTS AGE
node-exporter-6zgkz 1/1 Running 0 107s
node-exporter-9mvxr 1/1 Running 0 107s
node-exporter-jbll7 1/1 Running 0 107s
node-exporter-s7vvt 1/1 Running 0 107s
node-exporter-xmrjh 1/1 Running 0 107s
[root@master01 node_exporter]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d
kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 20m
metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 46h
node-exporter ClusterIP None <none> 9100/TCP 116s
[root@master01 node_exporter]#
验证:访问任意节点上的9100端口,url为/metrics,看看是否能够访问到指标数据?
提示:可以看到对应端口下/metrics url能够访问到对应的数据,说明node-exporter组件部署成功;
3、部署alertmanager
创建alertmanager pvc配置清单
[root@master01 alertmanager]# cat alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
[root@master01 alertmanager]#
创建pv
[root@master01 ~]# cat pv-demo.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv-v1
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
persistentVolumeReclaimPolicy: Retain
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /data/v1
server: 192.168.0.99
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv-v2
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
persistentVolumeReclaimPolicy: Retain
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /data/v2
server: 192.168.0.99
--- apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv-v3
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"]
persistentVolumeReclaimPolicy: Retain
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /data/v3
server: 192.168.0.99
[root@master01 ~]#
应用清单创建pv
[root@master01 ~]# kubectl apply -f pv-demo.yaml
persistentvolume/nfs-pv-v1 created
persistentvolume/nfs-pv-v2 created
persistentvolume/nfs-pv-v3 created
[root@master01 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs-pv-v1 5Gi RWO,ROX,RWX Retain Available 4s
nfs-pv-v2 5Gi RWO,ROX,RWX Retain Available 4s
nfs-pv-v3 5Gi RWO,ROX,RWX Retain Available 4s
[root@master01 ~]#
创建alertmanager service配置清单
[root@master01 alertmanager]# cat alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Alertmanager"
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 9093
nodePort: 30093
selector:
k8s-app: alertmanager
type: "NodePort"
[root@master01 alertmanager]#
创建alertmanager cm配置清单
[root@master01 alertmanager]# cat alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global: null
receivers:
- name: default-receiver
route:
group_interval: 5m
group_wait: 10s
receiver: default-receiver
repeat_interval: 3h
[root@master01 alertmanager]#
创建alertmanager 部署清单
[root@master01 alertmanager]# cat alertmanager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
namespace: kube-system
labels:
k8s-app: alertmanager
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v0.14.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: alertmanager
version: v0.14.0
template:
metadata:
labels:
k8s-app: alertmanager
version: v0.14.0
spec:
priorityClassName: system-cluster-critical
containers:
- name: prometheus-alertmanager
image: "prom/alertmanager:v0.14.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/alertmanager.yml
- --storage.path=/data
- --web.external-url=/
ports:
- containerPort: 9093
readinessProbe:
httpGet:
path: /#/status
port: 9093
initialDelaySeconds: 30
timeoutSeconds: 30
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: storage-volume
mountPath: "/data"
subPath: ""
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
# - name: prometheus-alertmanager-configmap-reload
# image: "jimmidyson/configmap-reload:v0.1"
# imagePullPolicy: "IfNotPresent"
# args:
# - --volume-dir=/etc/config
# - --webhook-url=http://localhost:9093/-/reload
# volumeMounts:
# - name: config-volume
# mountPath: /etc/config
# readOnly: true
# resources:
# limits:
# cpu: 10m
# memory: 10Mi
# requests:
# cpu: 10m
# memory: 10Mi
volumes:
- name: config-volume
configMap:
name: alertmanager-config
- name: storage-volume
persistentVolumeClaim:
claimName: alertmanager
[root@master01 alertmanager]#
应用上述4个清单,部署alertmanager
[root@master01 alertmanager]# ls
alertmanager-configmap.yaml alertmanager-deployment.yaml alertmanager-pvc.yaml alertmanager-service.yaml
[root@master01 alertmanager]# kubectl apply -f .
configmap/alertmanager-config created
deployment.apps/alertmanager created
persistentvolumeclaim/alertmanager created
service/alertmanager created
[root@master01 alertmanager]#
验证:查看对应pod和svc是否正常创建?
[root@master01 alertmanager]# kubectl get pods -l "k8s-app=alertmanager" -n kube-system
NAME READY STATUS RESTARTS AGE
alertmanager-6546bf7676-lt9jq 1/1 Running 0 85s
[root@master01 alertmanager]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 92s
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d
kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 31m
metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h
node-exporter ClusterIP None <none> 9100/TCP 13m
[root@master01 alertmanager]#
验证:访问任意节点的30093端口,看看是否能够访问到alertmanager?
提示:访问对应的端口能够访问到上述界面,说明alertmanager 部署成功;
4、部署prometheus-server
创建Prometheus rabc相关授权配置清单
[root@master01 prometheus-server]# cat prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
[root@master01 prometheus-server]#
创建Prometheus service配置清单
[root@master01 prometheus-server]# cat prometheus-service.yaml
kind: Service
apiVersion: v1
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/name: "Prometheus"
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
nodePort: 30090
selector:
k8s-app: prometheus
type: NodePort
[root@master01 prometheus-server]#
创建Prometheus cm配置清单
[root@master01 prometheus-server]# cat prometheus-configmap.yaml
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090 - job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name - job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name - job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
alerting:
alertmanagers:
- kubernetes_sd_configs:
- role: pod
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
regex: kube-system
action: keep
- source_labels: [__meta_kubernetes_pod_label_k8s_app]
regex: alertmanager
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex:
action: drop
[root@master01 prometheus-server]#
创建Prometheus 部署清单
[root@master01 prometheus-server]# cat prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.24.0
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
# - name: prometheus-server-configmap-reload
# image: "jimmidyson/configmap-reload:v0.1"
# imagePullPolicy: "IfNotPresent"
# args:
# - --volume-dir=/etc/config
# - --webhook-url=http://localhost:9090/-/reload
# volumeMounts:
# - name: config-volume
# mountPath: /etc/config
# readOnly: true
# resources:
# limits:
# cpu: 10m
# memory: 10Mi
# requests:
# cpu: 10m
# memory: 10Mi - name: prometheus-server
image: "prom/prometheus:v2.24.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
# storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "5Gi"
[root@master01 prometheus-server]#
提示:应用上述清单前,请确保对应pv容量是否够用;
应用上述4个清单部署Prometheus server
[root@master01 prometheus-server]# ls
prometheus-configmap.yaml prometheus-rbac.yaml prometheus-service.yaml prometheus-statefulset.yaml
[root@master01 prometheus-server]# kubectl apply -f .
configmap/prometheus-config created
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created
statefulset.apps/prometheus created
[root@master01 prometheus-server]#
验证:查看对应pod和svc是否成功创建?
[root@master01 prometheus-server]# kubectl get pods -l "k8s-app=prometheus" -n kube-system
NAME READY STATUS RESTARTS AGE
prometheus-0 1/1 Running 0 2m20s
[root@master01 prometheus-server]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 10m
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d
kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 40m
metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h
node-exporter ClusterIP None <none> 9100/TCP 22m
prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 2m27s
[root@master01 prometheus-server]#
验证:访问任意节点的30090端口,看看对应Prometheus 是否能够被访问?
提示:能够访问到上述页面,表示Prometheus server部署没有问题;
通过上述界面查看监控指标数据
提示:选择对应要查看的指标数据项,点击execute,对应图像就会呈现出来;到此Prometheus监控系统就部署完成了,接下来部署grafana,并配置grafana使用Prometheus数据源展示监控数据;
部署grafana
创建grafana 部署清单
[root@master01 grafana]# cat grafana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-grafana
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
task: monitoring
k8s-app: grafana
template:
metadata:
labels:
task: monitoring
k8s-app: grafana
spec:
containers:
- name: grafana
image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
ports:
- containerPort: 3000
protocol: TCP
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certificates
readOnly: true
- mountPath: /var
name: grafana-storage
env:
# - name: INFLUXDB_HOST
# value: monitoring-influxdb
- name: GF_SERVER_HTTP_PORT
value: "3000"
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
- name: GF_SERVER_ROOT_URL
value: /
volumes:
- name: ca-certificates
hostPath:
path: /etc/ssl/certs
- name: grafana-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: monitoring-grafana
name: monitoring-grafana
namespace: kube-system
spec:
ports:
- port: 80
targetPort: 3000
selector:
k8s-app: grafana
type: "NodePort"
[root@master01 grafana]#
应用资源清单 部署grafana
[root@master01 grafana]# ls
grafana.yaml
[root@master01 grafana]# kubectl apply -f .
deployment.apps/monitoring-grafana created
service/monitoring-grafana created
[root@master01 grafana]#
验证:查看对应pod和svc是否都创建?
[root@master01 grafana]# kubectl get pods -l "k8s-app=grafana" -n kube-system
NAME READY STATUS RESTARTS AGE
monitoring-grafana-6c74ccc5dd-grjzf 1/1 Running 0 87s
[root@master01 grafana]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 82m
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d
kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 112m
metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 2d
monitoring-grafana NodePort 10.100.230.71 <none> 80:30196/TCP 92s
node-exporter ClusterIP None <none> 9100/TCP 94m
prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 74m
[root@master01 grafana]#
提示:可以看到grafana svc暴露了30196端口;
验证:访问grafana service 暴露的端口,看看对应pod是否能够被访问?
提示:能够访问到上述页面,表示grafana部署成功;
配置grafana
1、配置grafana的数据源为Prometheus
2、新建监控面板
提示:进入grafana.com网站上,下载监控面板模板;
下载好模板文件以后,导入模板文件到grafana
提示:选择下载的模板文件,然后再选择对应的数据源,点击import即可;上面没有数据的原因是对应指标名称和Prometheus中指标名称不同导致的;我们可以根据自己环境Prometheus中指标数据名称来修改模板文件;
容器编排系统K8s之Prometheus监控系统+Grafana部署的更多相关文章
- Prometheus监控系统之入门篇(一)续
在上篇Prometheus监控系统之入门篇(一)中我们讲解了Prometheus的基本架构和工作流程, 并从0到1搭建了Prometheus服务,pushgateway以及告警系统. 本篇我们主要介绍 ...
- 容器云平台No.7~kubernetes监控系统prometheus-operator
简介 prometheus-operator Prometheus:一个非常优秀的监控工具或者说是监控方案.它提供了数据搜集.存储.处理.可视化和告警一套完整的解决方案.作为kubernetes官方推 ...
- prometheus监控系统
关于Prometheus Prometheus是一套开源的监控系统,它将所有信息都存储为时间序列数据:因此实现一种Profiling监控方式,实时分析系统运行的状态.执行时间.调用次数等,以找到系统的 ...
- Prometheus监控系统之入门篇(一)
1. 简介 Prometheus: (简称Prom)是由SoundCloud开发的开源监控报警系统.是大名鼎鼎的CNCF云原生基金会下的第二大开源项目.具有如下特点: 使用Go语言开发 内置时序数据库 ...
- Kubernetes容器集群管理环境 - Prometheus监控篇
一.Prometheus介绍之前已经详细介绍了Kubernetes集群部署篇,今天这里重点说下Kubernetes监控方案-Prometheus+Grafana.Prometheus(普罗米修斯)是一 ...
- kubernetes生态--交付prometheus监控及grafana炫酷dashboard到k8s集群
由于docker容器的特殊性,传统的zabbix无法对k8s集群内的docker状态进行监控,所以需要使用prometheus来进行监控: 什么是Prometheus? Prometheus是由Sou ...
- K8s之Prometheus监控
目录 容器监控与报警 Prometheus prometheus简介 prometheus系统架构 prometheus 安装方式 容器方式安装prometheus operator部署 克隆项目 创 ...
- Grafana+Zabbix+Prometheus 监控系统
环境说明 软件 版本 操作系统 IP地址 Grafana 5.4.3-1 Centos7.5 192.168.18.231 Prometheus 2.6.1 Centos7.5 192.168.18. ...
- k8s中prometheus监控k8s外mysql
k8s外安装mysql https://www.cnblogs.com/uncleyong/p/10739530.html 配置MySQL Exporter采集MySQL监控数据 创建yaml文件:v ...
随机推荐
- 题解-Little C Loves 3 III
Little C Loves 3 III 给定 \(n\) 和序列 \(a_0,a_1,\dots,a_{2^n-1}\) 和 \(b_0,b_1,\dots,b_{2^n-1}\),求序列 \(c_ ...
- centos7 mysql 自动补全
1 yum -y install epel-release #配置erel源 2 yum -y install python-pip 3 pip install mycli #用pip安装 可能会出现 ...
- 编译opencv4.5.0
1. 环境vs2017或其它版本cmake-3.18设置环境变量OPENCV_TEST_DATA_PATH 值设置为 D:\sdk\vs2017\opencv-4.5.0\opencv_extra-4 ...
- 《Stereo R-CNN based 3D Object Detection for Autonomous Driving》论文解读
论文链接:https://arxiv.org/pdf/1902.09738v2.pdf 这两个月忙着做实验 博客都有些荒废了,写篇用于3D检测的论文解读吧,有理解错误的地方,烦请有心人指正). 博客原 ...
- C#面向对象(初级)
一.面向对象:创建一个对象,这个对象最终会帮你实现你的需求,尽管其中的过程非常曲折艰难.这也就是所谓的"你办事我放心". 例如: 面向对象:折纸 爸爸开心地用纸折成了一个纸鹤: 妈 ...
- 工具-Redis-使用(99.6.2)
@ 目录 1.启动 2.数据结构 3.String命令 4.其他常用命令 5.Hash命令 6.List命令 7.Set命令 8.Zset命令 关于作者 1.启动 redis-server 交互 re ...
- CSS鼠标指针cursor样式
参考来源:W3SCHOOL 有时我们需要在CSS布局时设定特定的鼠标指针样式,这时可以通过设定cursor来实现: url: 需使用的自定义光标的 URL. 注释:请在此列表的末端始终定义一种普通的光 ...
- Java基础进阶:内部类lambda重点摘要,详细讲解成员内部类,局部内部类,匿名内部类,Lambda表达式,Lambda表达式和匿名内部类的区别,附重难点,代码实现源码,课堂笔记,课后扩展及答案
内部类lambda重点摘要 内部类特点: 内部类可以直接访问外部类,包括私有 外部类访问内部类必须创建对象 创建内部对象格式: 外部类.内部类 对象名=new外部类().new内部类(); 静态内部类 ...
- 简单谈谈contextlib的使用
简单谈谈contextlib的使用 写在前面 做这件事的原因: 在看书的时候,我发现了有大佬们用contextlib管理上下文,真的很牛皮,但是百度了以下,每个大佬都写了很多很全很深刻,讲道理五花八门 ...
- Android插件换肤 一.实现原理
学习缺的不是时间,而是耐心 目的 1.搞懂系统获取资源文件到在加载布局的整个流程是自己实现换肤功能的理论基础 2.提高分析源码.追踪源码的能力 要点 1.XmlResourceParser (通过这个 ...