解决 Prometheus 不能获取 Kubernetes 集群上 Windows 节点的 Metrics

背景

接上一篇快速搭建 Windows Kubernetes , 我们发现原来在 Windows Kubernetes 会有一些与在 Linux 上使用不一样的体验，俗称坑，例如 hostAliases。对于我们希望真的把 Windows 放入生产，感觉除了基本的 Pod、Volume、Service 、Log 以外，我们还需要监控。一般来讲我们会用 Prometheus 来做监控，然后通过 Grafana 来展示，但是 Prometheus 的 Node Exporter 是为 *nix 设计的，所以在 Windows 上我们的自己想办法了。在 Prometheus Node Exporter 里推荐使用 WMI exporter ，感兴趣的童鞋可以去试试，本文主要还是想从一个原始的角度去分析处理，来理解怎么去写一个 Prometheus 的采集程序。

前提

一套 Windows Kuberentes
一个 Prometheus 环境

步骤

首先得找到 Kubelet 在 Windows 上暴露出来得数据格式，因为 cadivsor 并不支持 Windows, 社区有位同志写了一个相对简单的实现来支持；他这个的实现还是保持 Linux 上的一样，是从 <Node_IP>:10255/stats/summary上 expose metrics, metrics-server 与 kubectl top的数据也是来源于此，大致如下：

{

  "node": {

   "nodeName": "35598k8s9001",

   "startTime": "2018-08-26T07:25:08Z",

   "cpu": {

    "time": "2018-09-10T01:44:52Z",

    "usageCoreNanoSeconds": 8532520000000

   },

   "memory": {

    "time": "2018-09-10T01:44:52Z",

    "availableBytes": 14297423872,

    "usageBytes": 1978798080,

    "workingSetBytes": 734490624,

    "rssBytes": 0,

    "pageFaults": 0,

    "majorPageFaults": 0

   },

   "fs": {

    "time": "2018-09-10T01:44:52Z",

    "availableBytes": 15829303296,

    "capacityBytes": 32212250624,

    "usedBytes": 16382947328

   },

   "runtime": {

    "imageFs": {

     "time": "2018-09-10T01:44:53Z",

     "availableBytes": 15829303296,

     "capacityBytes": 32212250624,

     "usedBytes": 16382947328,

     "inodesUsed": 0

    }

   }

  },

  "pods": [

   {

    "podRef": {

     "name": "stdlogserverwin-5fbcc5648d-ztqsq",

     "namespace": "default",

     "uid": "f461a0b4-ab36-11e8-93c4-0017fa0362de"

    },

    "startTime": "2018-08-29T02:55:15Z",

    "containers": [

     {

      "name": "stdlogserverwin",

      "startTime": "2018-08-29T02:56:24Z",

      "cpu": {

       "time": "2018-09-10T01:44:54Z",

       "usageCoreNanoSeconds": 749578125000

      },

      "memory": {

       "time": "2018-09-10T01:44:54Z",

       "workingSetBytes": 83255296

      },

      "rootfs": {

       "time": "2018-09-10T01:44:54Z",

       "availableBytes": 15829303296,

       "capacityBytes": 32212250624,

       "usedBytes": 0

      },

      "logs": {

       "time": "2018-09-10T01:44:53Z",

       "availableBytes": 15829303296,

       "capacityBytes": 32212250624,

       "usedBytes": 16382947328,

       "inodesUsed": 0

      },

      "userDefinedMetrics": null

     }

    ],

    "cpu": {

     "time": "2018-08-29T02:56:24Z",

     "usageNanoCores": 0,

     "usageCoreNanoSeconds": 749578125000

    },

    "memory": {

     "time": "2018-09-10T01:44:54Z",

     "availableBytes": 0,

     "usageBytes": 0,

     "workingSetBytes": 83255296,

     "rssBytes": 0,

     "pageFaults": 0,

     "majorPageFaults": 0

    },

    "volume": [

     {

      "time": "2018-08-29T02:55:16Z",

      "availableBytes": 17378648064,

      "capacityBytes": 32212250624,

      "usedBytes": 14833602560,

      "inodesFree": 0,

      "inodes": 0,

      "inodesUsed": 0,

      "name": "default-token-wv5fc"

     }

    ],

    "ephemeral-storage": {

     "time": "2018-09-10T01:44:54Z",

     "availableBytes": 15829303296,

     "capacityBytes": 32212250624,

     "usedBytes": 16382947328

    }

   }

  ]

}

从上面可以看到，它包含了本机和 pod 的一些 metrics，相对 cadvisor 能提供的少了一些，但是基本监控是没问题的。接下来我们需要写一个小程序把数据转换成 Prometheus 能解析的数据。接下来用 python 写个小栗子, 先声明下我们要 expose 的 stats 对象

class Node:

    def __init__(self, name, cpu, memory):

        self.name = name

        self.cpu = cpu

        self.memory = memory

class Pod:

    def __init__(self, name, namespace,cpu, memory):

        self.name = name

        self.namespace = namespace

        self.cpu = cpu

        self.memory = memory

class Stats:

    def __init__(self, node, pods):

        self.node = node

        self.pods = pods

使用 Prometheus 的 python-client 来写一个 polling 的程序，去转换 kubelet stats 数据。

from urllib.request import urlopen

from stats import Node

from stats import Pod

from stats import Stats

import json

import asyncio

import prometheus_client as prom

import logging

import random

def getMetrics(url):

    #获取数据集

    response = urlopen(url)

    string = response.read().decode('utf-8')

    json_obj = json.loads(string)

    #用之前定义好的 stats 的对象来做 mapping

    node = Node('','','')

    node.name = json_obj['node']['nodeName']

    node.cpu = json_obj['node']['cpu']['usageCoreNanoSeconds']

    node.memory = json_obj['node']['memory']['usageBytes']

    pods_array = json_obj['pods']

    pods_list = []

    for item in pods_array:

        pod = Pod('','','','')

        pod.name = item['podRef']['name']

        pod.namespace = item['podRef']['namespace']

        pod.cpu = item['cpu']['usageCoreNanoSeconds']

        pod.memory = item['memory']['workingSetBytes']

        pods_list.append(pod)

    stats = Stats('','')

    stats.node = node

    stats.pods = pods_list

    return stats

#写个简单的日志输出格式

format = "%(asctime)s - %(levelname)s [%(name)s] %(threadName)s %(message)s"

logging.basicConfig(level=logging.INFO, format=format)

#声明我们需要导出的 metrics 及对应的  label 供未来查询使用

g1 = prom.Gauge('node_cpu_usageCoreNanoSeconds', 'CPU useage of the node', labelnames=['node_name'])

g2 = prom.Gauge('node_mem_usageBytes', 'Memory useage of the node', labelnames=['node_name'])

g3 = prom.Gauge('pod_cpu_usageCoreNanoSeconds', 'Memory useage of the node', labelnames=['pod_name','pod_namespace'])

g4 = prom.Gauge('pod_mem_usageBytes', 'Memory useage of the node', labelnames=['pod_name','pod_namespace'])

async def expose_stats(url):

    while True:

        stats = getMetrics(url)

        #以打印 node 本身的监控信息为例

        logging.info("nodename: {} value {}".format(stats.node.name, stats.node.cpu))

        # 为当前要 poll 的 metrics 赋值

        g1.labels(node_name=stats.node.name).set(stats.node.cpu)

        g2.labels(node_name=stats.node.name).set(stats.node.memory)

        pods_array = stats.pods

        for item in pods_array:

            g3.labels(pod_name=item.name,pod_namespace=item.namespace).set(item.memory)

            g4.labels(pod_name=item.name,pod_namespace=item.namespace).set(item.cpu)

        await asyncio.sleep(1)

if __name__ == '__main__':

    loop = asyncio.get_event_loop()

    # 启动一个 http server 来做 polling

    prom.start_http_server(8000)

    t0_value = 50

    #可以在每一台 Windows 机器上都启动一个这样的程序，也可以远程部署脚本来做 exposing

    url = 'http://localhost:10255/stats/summary'

    tasks = [loop.create_task(expose_stats(url))]

    try:

        loop.run_forever()

    except KeyboardInterrupt:

        pass

    finally:

        loop.close()

写完以后就可以启动这个程序了，访问他的 8000 端口就能看到相关的数据

![](https://www.cnblogs.com/images/cnblogs_com/bigdaddyblog/1310139/o_WeChat Image_20180928165327.png)

接下来需要在 prometheus 里加入配置，增加一个收集对象，如下例：

- job_name: python_app

  scrape_interval: 15s

  scrape_timeout: 10s

  metrics_path: /

  scheme: http

  static_configs:

  - targets:

    - localhost:8000

这样在 Prometheus 的页面上能查询到相关的信息了

![](https://www.cnblogs.com/images/cnblogs_com/bigdaddyblog/1310139/o_WeChat Image_20180928165226.png)

提问

解决 Prometheus 不能获取 Kubernetes 集群上 Windows 节点的 Metrics的更多相关文章

Prometheus 监控外部 Kubernetes 集群
转载自:https://www.qikqiak.com/post/monitor-external-k8s-on-prometheus/ 在实际环境中很多企业是将 Prometheus 单独部署在集群 ...
解决项目迁移至Kubernetes集群中的代理问题
解决项目迁移至Kubernetes集群中的代理问题随着Kubernetes技术的日益成熟,越来越多的企业选择用Kubernetes集群来管理项目.新项目还好,可以选择合适的集群规模从零开始构建项目: ...
kubernetes 集群添加node节点
kubernetes 集群添加node节点注意,我们并不需要把新增的node ip加入到证书里重新生成!!! 下面我们以添加node03为例一.添加集群里个节点的hostname并设置好对应主机名 ...
Rancher2.x 一键式部署 Prometheus + Grafana 监控 Kubernetes 集群
目录 1.Prometheus & Grafana 介绍 2.环境.软件准备 3.Rancher 2.x 应用商店 4.一键式部署 Prometheus 5.验证 Prometheus + G ...
基于TLS证书手动部署kubernetes集群(上)
一.简介 Kubernetes是Google在2014年6月开源的一个容器集群管理系统,使用Go语言开发,Kubernetes也叫K8S. K8S是Google内部一个叫Borg的容器集群管理系统衍生 ...
（转）基于TLS证书手动部署kubernetes集群(上)
转:https://www.cnblogs.com/wdliu/archive/2018/06/06/9147346.html 一.简介 Kubernetes是Google在2014年6月开源的一个容 ...
在kubernetes集群上用helm安装Datadog(Monitoring)
Datadog is a monitoring service that gathers monitoring data from your containers within your Azure ...
prometheus operator（Kubernetes 集群监控）
一.Prometheus Operator 介绍 Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernetes 监控方案,也可能是目前功能最全面 ...
Kubernetes集群向指定节点上创建容器
如果需要限制Pod到指定的Node上运行,则可以给Node打标签并给Pod配置NodeSelector. 给节点添加标签首先查看节点信息 [root@k8s-master ~]# kubectl g ...

随机推荐

Qt SQLite 批量插入优化（SQLite默认将每条语句看成单独的事务）good
使用SQLite存储数据时发现插入速度太慢,程序跑了将近五分钟才插入了不到三千条.上网查资料才发现,SQLite这种文件数据库与MySql机制不一样,每条事务都有打开和关闭文件的步骤,SQLite默认 ...
Effection Go
Introduction: 新语言, 新思维 Formatting Indentation: 默认tab Line Length: 无限制, 会自动换行 Parentheses: 圆括号, 无限制, ...
__declspec的15种用法
__cdecl和__stdcall都是函数调用规范(还有一个__fastcall),规定了参数出入栈的顺序和方法,如果只用VC编程的话可以不用关心,但是要在C++和Pascal等其他语言通信的时候就要 ...
xcode缓存清理
移除对旧设备的支持影响:可重新生成:再连接旧设备调试时,会重新自动生成. 路径: ~/Library/Developer/Xcode/iOS DeviceSupport 如果你不是在wb145230 ...
Quora的技术探索
关于问答类的应用,最早接触的是stackoverflow和知乎 ,而Quora作为知乎的原型,因为其创始人来自FaceBook而吸引了我.事实上关于Quora的技术分析,冯大辉和陈皓都已经有所详细的阐 ...
支持chrome30下载文件
function downloadX(url ,fileName){ const xhr = new XMLHttpRequest(); xhr.open('GET', url, true); xhr ...
如何替换Windows的Shell（即explorer.exe）
原文:如何替换Windows的Shell(即explorer.exe) 下载一个可以查看用户的SID的软件,如SysInternals套装中的PsGetsid.exe(地址:http://www.it ...
Window文件目录遍历和 WIN32_FIND_DATA 结构（非常详细的中文注释）
第一部分 *百度百科提供的内容总结:WIN32_FIND_DAT 第二部分 *程序实例第三部分 *一篇使用FindFirstFile和FindNextFile函数的博文第一部分 ...
Qt paintEvent绘制窗体注意Qt::WA_PaintOutsidePaintEvent 只是适用于X11，其他系统均无效
QPainter默认只能在paintEvent里面调用,但是: 在其他事件中绘制窗体,提示信息如下:QPainter::begin: Paint device returned engine == 0 ...
C++的 RTTI 观念和用途（非常详细）
自从1993年Bjarne Stroustrup [注1 ]提出有关C++ 的RTTI功能之建议﹐以及C++的异常处理(exception handling)需要RTTI:最近新推出的C++ 或多或少 ...

解决 Prometheus 不能获取 Kubernetes 集群上 Windows 节点的 Metrics

背景

前提

步骤

提问 insertAdIfNeeded("single_bottom");

解决 Prometheus 不能获取 Kubernetes 集群上 Windows 节点的 Metrics的更多相关文章

随机推荐

热门专题

提问