prometheus operator 部署自定义记录

环境:

k8s 1.11集群版本,kubeadm部署

docker 17.3.2版本

Centos 7系统

阿里云服务器

operator 源码下载

仓库下载prometheus operator

  1. $ git clone https://github.com/coreos/kube-prometheus.git
  2. $ cd kube-prometheus/manifests

进入到 manifests 目录下面,这个目录下面包含我们所有的资源清单文件,我们需要对其中的文件 prometheus-serviceMonitorKubelet.yaml 进行简单的修改,因为默认情况下,这个 ServiceMonitor 是关联的 kubelet 的10250端口去采集的节点数据,而我们前面说过为了安全,这个 metrics 数据已经迁移到10255这个只读端口上面去了,我们只需要将文件中的https-metrics更改成http-metrics即可,这个在 Prometheus-Operator 对节点端点同步的代码中有相关定义,感兴趣的可以点此查看完整代码

  1. Subsets: []v1.EndpointSubset{
  2. {
  3. Ports: []v1.EndpointPort{
  4. {
  5. Name: "https-metrics",
  6. Port: 10250,
  7. },
  8. {
  9. Name: "http-metrics",
  10. Port: 10255,
  11. },
  12. {
  13. Name: "cadvisor",
  14. Port: 4194,
  15. },
  16. },
  17. },
  18. },

需要注意将insecureSkipVerify参数配置为false,http才生效 insecureSkipVerify: false

  1. endpoints:
  2. - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  3. honorLabels: true
  4. interval: 30s
  5. port: http-metrics
  6. scheme: http
  7. tlsConfig:
  8. insecureSkipVerify: false
  9. - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
  10. honorLabels: true
  11. interval: 30s
  12. metricRelabelings:
  13. - action: drop
  14. regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
  15. sourceLabels:
  16. - __name__
  17. path: /metrics/cadvisor
  18. port: http-metrics
  19. scheme: http
  20. tlsConfig:
  21. insecureSkipVerify: false

配置alertmanager告警路由

配置钉钉路由文件,并创建为secret对象,挂载到prometheus-prometheus,yaml文件中。这里需要将prometheus数据就行持久化存储,还需要定义一个storageClass或者pvc挂载进去。

alertmanager-main.yaml

  1. global:
  2. resolve_timeout: 5m
  3. route:
  4. group_by: ['alertname']
  5. group_wait: 30s
  6. group_interval: 5m
  7. repeat_interval: 2h
  8. receiver: 'web.hook'
  9. receivers:
  10. - name: 'web.hook'
  11. webhook_configs:
  12. - url: 'http://prometheus-webhook-dingtalk.monitors.svc.cluster.local:8060/dingtalk/ops_dingding/send'

创建alertmanager 配置文件secret对象

  1. kubectl -n monitoring create se^C
  2. [k8s@master ~]$ kubectl -n monitoring create secret generic altermanager-main --from-file=altermanager-main.yaml

创建storageClass对象,为prometheus提供持久化存储,这里使用阿里云提供的云盘或NAS服务,创建自定义storageClass对象

这里选择云盘创建的alicloud-disk-ssd 存储对象

prometheus配置kubernetes_sd_configs

为prometheus配置服务自动发现功能,将prometheus-additional.yaml文件创建为secret对象

prometheus-additional.yaml

  1. - job_name: 'kubernetes-cadvisor'
  2. kubernetes_sd_configs:
  3. - role: node
  4. scheme: https
  5. tls_config:
  6. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  7. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  8. relabel_configs:
  9. - action: labelmap
  10. regex: __meta_kubernetes_node_label_(.+)
  11. - target_label: __address__
  12. replacement: kubernetes.default.svc:443
  13. - source_labels: [__meta_kubernetes_node_name]
  14. regex: (.+)
  15. target_label: __metrics_path__
  16. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  17. - job_name: 'kubernetes-service-endpoints'
  18. kubernetes_sd_configs:
  19. - role: endpoints
  20. relabel_configs:
  21. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  22. action: keep
  23. regex: true
  24. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
  25. action: replace
  26. target_label: __scheme__
  27. regex: (https?)
  28. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  29. action: replace
  30. target_label: __metrics_path__
  31. regex: (.+)
  32. - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  33. action: replace
  34. target_label: __address__
  35. regex: ([^:]+)(?::\d+)?;(\d+)
  36. replacement: $1:$2
  37. - action: labelmap
  38. regex: __meta_kubernetes_service_label_(.+)
  39. - source_labels: [__meta_kubernetes_namespace]
  40. action: replace
  41. target_label: kubernetes_namespace
  42. - source_labels: [__meta_kubernetes_service_name]
  43. action: replace
  44. target_label: kubernetes_name
  45. - job_name: 'kubernetes-services'
  46. kubernetes_sd_configs:
  47. - role: service
  48. metrics_path: /probe
  49. params:
  50. module: [http_2xx]
  51. relabel_configs:
  52. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
  53. action: keep
  54. regex: true
  55. - source_labels: [__address__]
  56. target_label: __param_target
  57. - target_label: __address__
  58. replacement: blackbox-exporter.example.com:9115
  59. - source_labels: [__param_target]
  60. target_label: instance
  61. - action: labelmap
  62. regex: __meta_kubernetes_service_label_(.+)
  63. - source_labels: [__meta_kubernetes_namespace]
  64. target_label: kubernetes_namespace
  65. - source_labels: [__meta_kubernetes_service_name]
  66. target_label: kubernetes_name
  67. - job_name: 'kubernetes-ingresses'
  68. kubernetes_sd_configs:
  69. - role: ingress
  70. relabel_configs:
  71. - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
  72. action: keep
  73. regex: true
  74. - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
  75. regex: (.+);(.+);(.+)
  76. replacement: ${1}://${2}${3}
  77. target_label: __param_target
  78. - target_label: __address__
  79. replacement: blackbox-exporter.example.com:9115
  80. - source_labels: [__param_target]
  81. target_label: instance
  82. - action: labelmap
  83. regex: __meta_kubernetes_ingress_label_(.+)
  84. - source_labels: [__meta_kubernetes_namespace]
  85. target_label: kubernetes_namespace
  86. - source_labels: [__meta_kubernetes_ingress_name]
  87. target_label: kubernetes_name
  88. - job_name: 'kubernetes-pods'
  89. kubernetes_sd_configs:
  90. - role: pod
  91. relabel_configs:
  92. - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  93. action: keep
  94. regex: true
  95. - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
  96. action: replace
  97. target_label: __metrics_path__
  98. regex: (.+)
  99. - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  100. action: replace
  101. regex: ([^:]+)(?::\d+)?;(\d+)
  102. replacement: $1:$2
  103. target_label: __address__
  104. - action: labelmap
  105. regex: __meta_kubernetes_pod_label_(.+)
  106. - source_labels: [__meta_kubernetes_namespace]
  107. action: replace
  108. target_label: kubernetes_namespace
  109. - source_labels: [__meta_kubernetes_pod_name]
  110. action: replace
  111. target_label: kubernetes_pod_name

创建secret对象additional-config

  1. kubectl -n monitoring create secret generics additional-config --from-file=prometheus-additional.yaml

修改prometheus-prometheus.yaml文件

将前面自定的配置文件和存储类写进prometheus中,实现prometheus监控的自定义化

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: Prometheus
  3. metadata:
  4. labels:
  5. prometheus: k8s
  6. name: k8s
  7. namespace: monitoring
  8. spec:
  9. alerting:
  10. alertmanagers:
  11. - name: alertmanager-main
  12. namespace: monitoring
  13. port: web
  14. storage: #配置持久化存储
  15. volumeClaimTemplate:
  16. spec:
  17. storageClassName: alicloud-disk-ssd #使用alicloud-disk-ssd存储类
  18. resources:
  19. requests:
  20. storage: 50Gi
  21. baseImage: quay.io/prometheus/prometheus
  22. nodeSelector:
  23. kubernetes.io/os: linux
  24. podMonitorSelector: {}
  25. replicas: 2
  26. secrets: #etcd 证书secret配置文件
  27. - etcd-certs
  28. resources:
  29. requests:
  30. memory: 400Mi
  31. ruleSelector:
  32. matchLabels:
  33. prometheus: k8s
  34. role: alert-rules
  35. securityContext:
  36. fsGroup: 2000
  37. runAsNonRoot: true
  38. runAsUser: 1000
  39. additionalScrapeConfigs: #配置服务发现功能
  40. name: additional-configs #secret 资源对象名称
  41. key: prometheus-additional.yaml #secret 对象中的key
  42. serviceAccountName: prometheus-k8s
  43. serviceMonitorNamespaceSelector: {}
  44. serviceMonitorSelector: {}
  45. version: v2.11.0

etcd 使用的证书都对应在节点的 /etc/kubernetes/pki/etcd 这个路径下面,所以首先我们将需要使用到的证书通过 secret 对象保存到集群中去:(在 etcd 运行的节点)

  1. kubectl create secret generics etcd---from-file=/etc/kubernetes/pki/etcd/ca.pemcerts --from-file=/etc/kubernetes/pki/etcd/etcd-client.pem --from-file=/etc/kubernetes/pki/etcd/etcd-client-key.pem -n monitoring

创建 ServiceMonitor

现在 Prometheus 访问 etcd 集群的证书已经准备好了,接下来创建 ServiceMonitor 对象即可(prometheus-serviceMonitorEtcd.yaml)

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: ServiceMonitor
  3. metadata:
  4. name: etcd-k8s
  5. namespace: monitoring
  6. labels:
  7. k8s-app: etcd-k8s
  8. spec:
  9. jobLabel: k8s-app
  10. endpoints:
  11. - port: port
  12. interval: 30s
  13. scheme: https
  14. tlsConfig:
  15. caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
  16. certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.pem
  17. keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.pem
  18. insecureSkipVerify: true
  19. selector:
  20. matchLabels:
  21. k8s-app: etcd
  22. namespaceSelector:
  23. matchNames:
  24. - kube-system

创建service

ServiceMonitor 创建完成了,但是现在还没有关联的对应的 Service 对象,所以需要我们去手动创建一个 Service 对象(prometheus-etcdService.yaml):

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: etcd-k8s
  5. namespace: kube-system
  6. labels:
  7. k8s-app: etcd
  8. spec:
  9. type: ClusterIP
  10. clusterIP: None
  11. ports:
  12. - name: port
  13. port: 2379
  14. protocol: TCP
  15. ---
  16. apiVersion: v1
  17. kind: Endpoints
  18. metadata:
  19. name: etcd-k8s
  20. namespace: kube-system
  21. labels:
  22. k8s-app: etcd
  23. subsets:
  24. - addresses:
  25. - ip: 172.16.23.231
  26. nodeName: etcd-master
  27. ports:
  28. - name: port
  29. port: 2379
  30. protocol: TCP

我们这里创建的 Service 没有采用前面通过 label 标签的形式去匹配 Pod 的做法,因为前面我们说过很多时候我们创建的 etcd 集群是独立于集群之外的,这种情况下面我们就需要自定义一个 Endpoints,要注意 metadata 区域的内容要和 Service 保持一致,Service 的 clusterIP 设置为 None,对改知识点不太熟悉的,可以去查看我们前面关于 Service 部分的讲解。

Endpoints 的 subsets 中填写 etcd 集群的地址即可,我们这里是单节点的,填写一个即可,如果etcd配置文件中配置地址为127.0.0.1则有可能监控失败,需要修改为0.0.0.0

创建scheduler,controller-manager 组件的Service对象

kube-scheduler、kube-controller-manager组件绑定地址都为127.0.0.1,需要进入配置文件进行修改为0.0.0.0才能访问端口,进行监控

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: kube-scheduler
  5. namespace: kube-system
  6. labels:
  7. k8s-app: kube-scheduler
  8. spec:
  9. selector:
  10. component: kube-scheduler
  11. clusterIP: None
  12. ports:
  13. - name: http-metrics
  14. targetPort: 10251
  15. port: 10251
  16. protocol: TCP
  17. ---
  18. apiVersion: v1
  19. kind: Service
  20. metadata:
  21. name: kube-controller-manager
  22. namespace: kube-system
  23. labels:
  24. k8s-app: kube-controller-manager
  25. spec:
  26. selector:
  27. component: kube-controller-manager
  28. clusterIP: None
  29. ports:
  30. - name: http-metrics
  31. targetPort: 10252
  32. port: 10252
  33. ##kubelet-service.yaml 文件省略

修改prometheus sa用户权限

  1. apiVersion: rbac.authorization.k8s.io/v1
  2. kind: ClusterRole
  3. metadata:
  4. name: prometheus-k8s
  5. rules:
  6. - apiGroups:
  7. - ""
  8. resources:
  9. - nodes
  10. - services
  11. - endpoints
  12. - pods
  13. - nodes/proxy
  14. verbs:
  15. - get
  16. - list
  17. - watch
  18. - apiGroups:
  19. - ""
  20. resources:
  21. - configmaps
  22. - nodes/metrics
  23. verbs:
  24. - get
  25. - nonResourceURLs:
  26. - /metrics
  27. verbs:
  28. - get

创建operator

  1. kubectl create -f .

到此prometheus-operator 生产环境中就已经部署完毕,grafana的图表配置和alertmanager的告警优化模板通知功能还需补全。

  1. 1、创建storageClass对象,为prometheus提供持久化存储写入prometheus-prometheus.yaml文件中
  2. 2、修改alermanager-secret.yaml secret对象中的配置数据,改成自定义的钉钉报警路由或者
  3. 邮箱账号配置
  4. 3、创建etcdschedulercontroller Service对象
  5. 4、配置服务告警规则prometheus-etcdRules.yaml 文件或在源文件中添加
  6. 5、创建prometheus服务自动发现secret配置文件,并写入prometheus-prometheus.yaml文件中
  7. 6、创建etcd证书secret serviceMonitorEtcd对象文件
  8. 7、修改promethus-clusterRule.yaml 权限
  9. 8、执行部署

prometheus operator 部署的更多相关文章

  1. Prometheus Operator 教程:根据服务维度对 Prometheus 分片

    原文链接:https://fuckcloudnative.io/posts/aggregate-metrics-user-prometheus-operator/ Promtheus 本身只支持单机部 ...

  2. 使用Prometheus Operator 监控Kubernetes(15)

    一.Prometheus概述: Prometheus是一个开源系统监测和警报工具箱. Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernete ...

  3. Prometheus Operator 对接 Thanos

    文章转载自:https://jishuin.proginn.com/p/763bfbd56ae4 使用 Prometheus Operator 来进行监控,在 Prometheus 高可用的章节中也手 ...

  4. Kubernetes 监控:Prometheus Operator + Thanos ---实践篇

    具体参考网址:https://www.cnblogs.com/sanduzxcvbnm/p/16291296.html 本章用到的yaml文件地址:https://files.cnblogs.com/ ...

  5. kubernetes之监控Operator部署Prometheus(三)

    第一章和第二章中我们配置Prometheus的成本非常高,而且也非常麻烦.但是我们要考虑Prometheus.AlertManager 这些组件服务本身的高可用的话,成本就更高了,当然我们也完全可以用 ...

  6. 部署 Prometheus Operator - 每天5分钟玩转 Docker 容器技术(179)

    本节在实践时使用的是 Prometheus Operator 版本 v0.14.0.由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档. 下载最新源码 git clone https: ...

  7. 部署 Prometheus Operator【转】

    本节在实践时使用的是 Prometheus Operator 版本 v0.14.0.由于项目开发迭代速度很快,部署方法可能会更新,必要时请参考官方文档. 下载最新源码 git clone https: ...

  8. Prometheus Operator 架构 - 每天5分钟玩转 Docker 容器技术(178)

    本节讨论 Prometheus Operator 的架构.因为 Prometheus Operator 是基于 Prometheus 的,我们需要先了解一下 Prometheus. Prometheu ...

  9. Prometheus Operator 监控Kubernetes

    Prometheus Operator 监控Kubernetes 1. Prometheus的基本架构 ​ Prometheus是一个开源的完整监控解决方案,涵盖数据采集.查询.告警.展示整个监控流程 ...

随机推荐

  1. 转:laydate只显示时分,不显示秒

    @转载地址 原文全文: 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明.本文链接:https://blog.csdn.net/weixin_40 ...

  2. 必备Mysql命令

    文章来源:https://macrozheng.github.io/mall-learning/#/reference/mysql 开发者必备Mysql命令 开发者必备Mysql常用命令,涵盖了数据定 ...

  3. [LeetCode] 295. Find Median from Data Stream 找出数据流的中位数

    Median is the middle value in an ordered integer list. If the size of the list is even, there is no ...

  4. [LeetCode] 377. Combination Sum IV 组合之和 IV

    Given an integer array with all positive numbers and no duplicates, find the number of possible comb ...

  5. POJ 1221 UNIMODAL PALINDROMIC DECOMPOSITIONS

    总时间限制: 1000ms 内存限制: 65536kB 描述 A sequence of positive integers is Palindromic if it reads the same f ...

  6. mysql查看和修改最大连接数

    查看最大连接数 SHOW VARIABLES LIKE '%max_connections%'; 修改最大连接数 ;

  7. spring中最重要的一些Aware接口

    附上关于这节的spring官方文档: ApplicationContextAware and BeanNameAware aware接口在spring中无处不在,它是用来感知spring的ioc co ...

  8. CPU使用率过高问题定位

    (1)top 命令 ->查询出CPU使用率最高的 PID编号. (2)top -H p PID编号 ->能查询出所有线程的CPU使用率的列表(线程编号也在PID列). (3)jstack ...

  9. Idea破解到2100年的简单方法

    第一步下载IntelliJ IDEA 2018.1.6版本,比这个更新的版本得你自己找注册码,因为旧的注册码对最新版本的软件不管用,所以建议还是下载这个版本,或者这个版本之前的也可以: 地址:http ...

  10. 栈习题(1)-对于任意的无符号的的十进制数m,写出将其转换为十六进制整数的算法(正确输出即可)

    /*对于任意的无符号的的十进制数m,写出将其转换为十六进制整数的算法(正确输出即可)*/ /* 算法思想:利用辗转取余法,每次都将余数存入栈中,直到被除数等0,退出循环. 输出栈里的内容即可 */ v ...