一.系统环境

服务器版本 docker软件版本 Kubernetes(k8s)集群版本 CPU架构
CentOS Linux release 7.4.1708 (Core) Docker version 20.10.12 v1.21.9 x86_64

Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点

服务器 操作系统版本 CPU架构 进程 功能描述
k8scloude1/192.168.110.130 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico k8s master节点
k8scloude2/192.168.110.129 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker节点
k8scloude3/192.168.110.128 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker节点

二.前言

本文介绍pod的调度,即如何让pod运行在Kubernetes集群的指定节点。

进行pod的调度的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》https://www.cnblogs.com/renshengdezheli/p/16686769.html

三.pod的调度

3.1 pod的调度概述

你可以约束一个 Pod 以便 限制 其只能在特定的节点上运行, 或优先在特定的节点上运行。 有几种方法可以实现这点,推荐的方法都是用 标签选择算符来进行选择。 通常这样的约束不是必须的,因为调度器将自动进行合理的放置(比如,将 Pod 分散到节点上, 而不是将 Pod 放置在可用资源不足的节点上等等)。但在某些情况下,你可能需要进一步控制 Pod 被部署到哪个节点。例如,确保 Pod 最终落在连接了 SSD 的机器上, 或者将来自两个不同的服务且有大量通信的 Pods 被放置在同一个可用区。

你可以使用下列方法中的任何一种来选择 Kubernetes 对特定 Pod 的调度:

  • 与节点标签匹配的 nodeSelector
  • 亲和性与反亲和性
  • nodeName 字段
  • Pod 拓扑分布约束

3.2 pod自动调度

如果不手动指定pod运行在哪个节点上,k8s会自动调度pod的,k8s自动调度pod在哪个节点上运行考虑的因素有:

  • 待调度的pod列表
  • 可用的node列表
  • 调度算法:主机过滤,主机打分

3.2.1 创建3个主机端口为80的pod

查看hostPort字段的解释,hostPort字段表示把pod的端口映射到节点,即在节点上公开 Pod 的端口。

#主机端口映射:hostPort: 80
[root@k8scloude1 pod]# kubectl explain pods.spec.containers.ports.hostPort
KIND: Pod
VERSION: v1 FIELD: hostPort <integer> DESCRIPTION:
Number of port to expose on the host. If specified, this must be a valid
port number, 0 < x < 65536. If HostNetwork is specified, this must match
ContainerPort. Most containers do not need this.

创建第一个pod,hostPort: 80表示把容器的80端口映射到节点的80端口

[root@k8scloude1 pod]# vim schedulepod.yaml

#kind: Pod表示资源类型为Pod   labels指定pod标签   metadata下面的name指定pod名字   containers下面全是容器的定义
#image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名
#resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略
#restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口
[root@k8scloude1 pod]# cat schedulepod.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod
name: pod
namespace: pod
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod.yaml
pod/pod created [root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 6s

可以看到pod创建成功。

接下来创建第二个pod,hostPort: 80表示把容器的80端口映射到节点的80端口,两个pod只有pod名字不一样。

[root@k8scloude1 pod]# cp schedulepod.yaml schedulepod1.yaml 

[root@k8scloude1 pod]# vim schedulepod1.yaml 

[root@k8scloude1 pod]# cat schedulepod1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod1.yaml
pod/pod1 created [root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 11m
pod1 1/1 Running 0 5s

第二个pod创建成功,现在创建第三个pod。

开篇我们已经介绍过集群架构,Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点,k8s集群只有2个worker节点,master节点默认不运行应用pod,主机端口80已经被占用两台worker节点全部占用,所以pod2无法运行。

[root@k8scloude1 pod]# sed 's/pod1/pod2/' schedulepod1.yaml | kubectl apply -f -
pod/pod2 created #主机端口80已经被占用两台worker节点全部占用,pod2无法运行
[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 16m
pod1 1/1 Running 0 5m28s
pod2 0/1 Pending 0 5s

观察pod在k8s集群的分布情况,NODE显示pod运行在哪个节点

[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod 1/1 Running 0 18m
pod1 1/1 Running 0 7m28s [root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod 1/1 Running 0 29m 10.244.251.208 k8scloude3 <none> <none>
pod1 1/1 Running 0 18m 10.244.112.156 k8scloude2 <none> <none>

删除pod

[root@k8scloude1 pod]# kubectl delete pod pod2
pod "pod2" deleted [root@k8scloude1 pod]# kubectl delete pod pod1 pod
pod "pod1" deleted
pod "pod" deleted

上面三个pod都是k8s自动调度的,下面我们手动指定pod运行在哪个节点。

3.3 使用nodeName 字段指定pod运行在哪个节点

使用nodeName 字段指定pod运行在哪个节点,这是一种比较直接的方式,nodeName 是 Pod 规约中的一个字段。如果 nodeName 字段不为空,调度器会忽略该 Pod, 而指定节点上的 kubelet 会尝试将 Pod 放到该节点上。 使用 nodeName 规则的优先级会高于使用 nodeSelector 或亲和性与非亲和性的规则

使用 nodeName 来选择节点的方式有一些局限性:

  • 如果所指代的节点不存在,则 Pod 无法运行,而且在某些情况下可能会被自动删除。
  • 如果所指代的节点无法提供用来运行 Pod 所需的资源,Pod 会失败, 而其失败原因中会给出是否因为内存或 CPU 不足而造成无法运行。
  • 在云环境中的节点名称并不总是可预测的,也不总是稳定的。

创建pod,nodeName: k8scloude3表示pod要运行在名为k8scloude3的节点

[root@k8scloude1 pod]# vim schedulepod2.yaml 

#kind: Pod表示资源类型为Pod   labels指定pod标签   metadata下面的name指定pod名字   containers下面全是容器的定义
#image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名
#resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略
#restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口
#nodeName: k8scloude3指定pod在k8scloude3上运行
[root@k8scloude1 pod]# cat schedulepod2.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeName: k8scloude3
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod2.yaml
pod/pod1 created

可以看到pod运行在k8scloude3节点

[root@k8scloude1 pod]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 7s 10.244.251.209 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. [root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.

创建pod,nodeName: k8scloude1让pod运行在k8scloude1节点

[root@k8scloude1 pod]# vim schedulepod3.yaml 

#kind: Pod表示资源类型为Pod   labels指定pod标签   metadata下面的name指定pod名字   containers下面全是容器的定义
#image指定镜像名字 imagePullPolicy指定镜像下载策略 containers下面的name指定容器名
#resources指定容器资源(CPU,内存等) env指定容器里的环境变量 dnsPolicy指定DNS策略
#restartPolicy容器重启策略 ports指定容器端口 containerPort容器端口 hostPort节点上的端口
#nodeName: k8scloude1让pod运行在k8scloude1节点
[root@k8scloude1 pod]# cat schedulepod3.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeName: k8scloude1
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod3.yaml
pod/pod1 created

可以看到pod运行在k8scloude1,注意k8scloude1是master节点,master节点一般不运行应用pod,并且k8scloude1有污点,一般来说,pod是不运行在有污点的主机上的,如果强制调度上去的话,pod的状态应该是pending,但是通过nodeName可以把一个pod调度到有污点的主机上正常运行的,比如nodeName指定pod运行在master上

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 47s 10.244.158.81 k8scloude1 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.4 使用节点标签nodeSelector指定pod运行在哪个节点

与很多其他 Kubernetes 对象类似,节点也有标签。 你可以手动地添加标签。 Kubernetes 也会为集群中所有节点添加一些标准的标签。

通过为节点添加标签,你可以准备让 Pod 调度到特定节点或节点组上。 你可以使用这个功能来确保特定的 Pod 只能运行在具有一定隔离性,安全性或监管属性的节点上。

nodeSelector 是节点选择约束的最简单推荐形式。你可以将 nodeSelector 字段添加到 Pod 的规约中设置你希望目标节点所具有的节点标签。 Kubernetes 只会将 Pod 调度到拥有你所指定的每个标签的节点上。nodeSelector 提供了一种最简单的方法来将 Pod 约束到具有特定标签的节点上。

3.4.1 查看标签

查看节点node的标签,标签的格式:键值对:xxxx/yyyy.aaaa=456123,xxxx1/yyyy1.aaaa=456123,--show-labels参数显示标签

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

查看namespace的标签

[root@k8scloude1 pod]# kubectl get ns --show-labels
NAME STATUS AGE LABELS
default Active 7d1h kubernetes.io/metadata.name=default
kube-node-lease Active 7d1h kubernetes.io/metadata.name=kube-node-lease
kube-public Active 7d1h kubernetes.io/metadata.name=kube-public
kube-system Active 7d1h kubernetes.io/metadata.name=kube-system
ns1 Active 6d5h kubernetes.io/metadata.name=ns1
ns2 Active 6d5h kubernetes.io/metadata.name=ns2
pod Active 4d2h kubernetes.io/metadata.name=pod

查看pod的标签

[root@k8scloude1 pod]# kubectl get pod -A --show-labels
NAMESPACE NAME READY STATUS RESTARTS AGE LABELS
kube-system calico-kube-controllers-6b9fbfff44-4jzkj 1/1 Running 12 7d k8s-app=calico-kube-controllers,pod-template-hash=6b9fbfff44
kube-system calico-node-bdlgm 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system calico-node-hx8bk 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system calico-node-nsbfs 1/1 Running 7 7d controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system coredns-545d6fc579-7wm95 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system coredns-545d6fc579-87q8j 1/1 Running 7 7d1h k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system etcd-k8scloude1 1/1 Running 7 7d1h component=etcd,tier=control-plane
kube-system kube-apiserver-k8scloude1 1/1 Running 11 7d1h component=kube-apiserver,tier=control-plane
kube-system kube-controller-manager-k8scloude1 1/1 Running 7 7d1h component=kube-controller-manager,tier=control-plane
kube-system kube-proxy-599xh 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-lpj8z 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-zxlk9 1/1 Running 7 7d1h controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-scheduler-k8scloude1 1/1 Running 7 7d1h component=kube-scheduler,tier=control-plane
kube-system metrics-server-bcfb98c76-k5dmj 1/1 Running 6 6d5h k8s-app=metrics-server,pod-template-hash=bcfb98c76

3.4.2 创建标签

以node-role.kubernetes.io/control-plane= 标签为例,键是node-role.kubernetes.io/control-plane,值为空。

创建标签的语法:kubectl label 对象类型 对象名 键=值

给k8scloude2节点设置标签

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,k8snodename=k8scloude2,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

k8scloude2节点删除标签

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

列出含有标签k8snodename=k8scloude2的节点

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2

#列出含有标签k8snodename=k8scloude2的节点
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

对所有节点设置标签

[root@k8scloude1 pod]# kubectl label nodes --all k8snodename=cloude
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled

列出含有标签k8snodename=cloude的节点

#列出含有标签k8snodename=cloude的节点
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 7d1h v1.21.0
k8scloude2 Ready <none> 7d1h v1.21.0
k8scloude3 Ready <none> 7d1h v1.21.0 #删除标签
[root@k8scloude1 pod]# kubectl label nodes --all k8snodename-
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
No resources found

--overwrite参数,标签的覆盖

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled #标签的覆盖
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude
error: 'k8snodename' already has a value (k8scloude2), and --overwrite is false #--overwrite参数,标签的覆盖
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude --overwrite
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

Tips如果不想在k8scloude1的ROLES里看到control-plane,则可以通过取消标签达到目的:kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane- 进行取消标签

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8scloude1 Ready control-plane,master 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3 Ready <none> 7d1h v1.21.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux [root@k8scloude1 pod]# kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane-

3.4.3 通过标签控制pod在哪个节点运行

给k8scloude2节点打上标签k8snodename=k8scloude2

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME STATUS ROLES AGE VERSION
k8scloude2 Ready <none> 7d1h v1.21.0 [root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.

创建pod,nodeSelector:k8snodename: k8scloude2 指定pod运行在标签为k8snodename=k8scloude2的节点上

[root@k8scloude1 pod]# vim schedulepod4.yaml

[root@k8scloude1 pod]# cat schedulepod4.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeSelector:
k8snodename: k8scloude2
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod4.yaml
pod/pod1 created

可以看到pod运行在k8scloude2节点

[root@k8scloude1 pod]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 21s 10.244.112.158 k8scloude2 <none> <none>

删除pod,删除标签

[root@k8scloude1 pod]# kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
pod1 1/1 Running 0 32m run=pod1 [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted [root@k8scloude1 pod]# kubectl get pod --show-labels
No resources found in pod namespace. [root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
No resources found

注意:如果两台主机的标签是一致的,那么通过在这两台机器上进行打分,哪个机器分高,pod就运行在哪个pod上

给k8s集群的master节点打标签

[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename=k8scloude1
node/k8scloude1 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
NAME STATUS ROLES AGE VERSION
k8scloude1 Ready control-plane,master 7d2h v1.21.0

创建pod,nodeSelector:k8snodename: k8scloude1 指定pod运行在标签为k8snodename=k8scloude1的节点上

[root@k8scloude1 pod]# vim schedulepod5.yaml 

[root@k8scloude1 pod]# cat schedulepod5.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
nodeSelector:
k8snodename: k8scloude1
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f schedulepod5.yaml
pod/pod1 created

因为k8scloude1上有污点,所以pod不能运行在k8scloude1上,pod状态为Pending

[root@k8scloude1 pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod1 0/1 Pending 0 9s

删除pod,删除标签

[root@k8scloude1 pod]# kubectl delete pod pod1
pod "pod1" deleted [root@k8scloude1 pod]# kubectl get pod
No resources found in pod namespace. [root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename-
node/k8scloude1 labeled [root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
No resources found

3.5 使用亲和性与反亲和性调度pod

nodeSelector 提供了一种最简单的方法来将 Pod 约束到具有特定标签的节点上。 亲和性和反亲和性扩展了你可以定义的约束类型。使用亲和性与反亲和性的一些好处有:

  • 亲和性、反亲和性语言的表达能力更强。nodeSelector 只能选择拥有所有指定标签的节点。 亲和性、反亲和性为你提供对选择逻辑的更强控制能力。

  • 你可以标明某规则是“软需求”或者“偏好”,这样调度器在无法找到匹配节点时仍然调度该 Pod。

  • 你可以使用节点上(或其他拓扑域中)运行的其他 Pod 的标签来实施调度约束, 而不是只能使用节点本身的标签。这个能力让你能够定义规则允许哪些 Pod 可以被放置在一起。

亲和性功能由两种类型的亲和性组成:

  • 节点亲和性功能类似于 nodeSelector 字段,但它的表达能力更强,并且允许你指定软规则。
  • Pod 间亲和性/反亲和性允许你根据其他 Pod 的标签来约束 Pod。

节点亲和性概念上类似于 nodeSelector, 它使你可以根据节点上的标签来约束 Pod 可以调度到哪些节点上。 节点亲和性有两种:

  • requiredDuringSchedulingIgnoredDuringExecution: 调度器只有在规则被满足的时候才能执行调度。此功能类似于 nodeSelector, 但其语法表达能力更强。
  • preferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。

在上述类型中,IgnoredDuringExecution 意味着如果节点标签在 Kubernetes 调度 Pod 后发生了变更,Pod 仍将继续运行

你可以使用 Pod 规约中的 .spec.affinity.nodeAffinity 字段来设置节点亲和性。

查看nodeAffinity字段解释

[root@k8scloude1 pod]# kubectl explain pods.spec.affinity.nodeAffinity
KIND: Pod
VERSION: v1 RESOURCE: nodeAffinity <Object> DESCRIPTION:
Describes node affinity scheduling rules for the pod. Node affinity is a group of node affinity scheduling rules. FIELDS:
#软策略
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node matches
the corresponding matchExpressions; the node(s) with the highest sum are
the most preferred. #硬策略
requiredDuringSchedulingIgnoredDuringExecution <Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to an update), the system may or may not try
to eventually evict the pod from its node.

3.5.1 使用硬策略requiredDuringSchedulingIgnoredDuringExecution

创建pod,requiredDuringSchedulingIgnoredDuringExecution参数表示:节点必须包含一个键名为 kubernetes.io/hostname 的标签, 并且该标签的取值必须k8scloude2k8scloude3

你可以使用 operator 字段来为 Kubernetes 设置在解释规则时要使用的逻辑操作符。 你可以使用 In、NotIn、Exists、DoesNotExist、Gt 和 Lt 之一作为操作符。NotIn 和 DoesNotExist 可用来实现节点反亲和性行为。 你也可以使用节点污点 将 Pod 从特定节点上驱逐。

注意:

  • 如果你同时指定了 nodeSelector 和 nodeAffinity,两者 必须都要满足, 才能将 Pod 调度到候选节点上。
  • 如果你指定了多个与 nodeAffinity 类型关联的 nodeSelectorTerms, 只要其中一个 nodeSelectorTerms 满足的话,Pod 就可以被调度到节点上。
  • 如果你指定了多个与同一 nodeSelectorTerms 关联的 matchExpressions, 则只有当所有 matchExpressions 都满足时 Pod 才可以被调度到节点上。
[root@k8scloude1 pod]# vim requiredDuringSchedule.yaml 

 #硬策略
[root@k8scloude1 pod]# cat requiredDuringSchedule.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8scloude2
- k8scloude3
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule.yaml
pod/pod1 created

可以看到pod运行在k8scloude3节点

[root@k8scloude1 pod]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod1 1/1 Running 0 6s [root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 10s 10.244.251.212 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

创建pod,requiredDuringSchedulingIgnoredDuringExecution参数表示:节点必须包含一个键名为 kubernetes.io/hostname 的标签, 并且该标签的取值必须k8scloude4k8scloude5

[root@k8scloude1 pod]# vim requiredDuringSchedule1.yaml 

[root@k8scloude1 pod]# cat requiredDuringSchedule1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8scloude4
- k8scloude5
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule1.yaml
pod/pod1 created

由于requiredDuringSchedulingIgnoredDuringExecution是硬策略,k8scloude4,k8scloude5不满足条件,所以pod创建失败

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 0/1 Pending 0 7s <none> <none> <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.5.2 使用软策略preferredDuringSchedulingIgnoredDuringExecution

给节点打标签

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 xx=72
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl label nodes k8scloude3 xx=59
node/k8scloude3 labeled

创建pod,preferredDuringSchedulingIgnoredDuringExecution参数表示:节点最好具有一个键名为 xx 且取值大于 60 的标签。

[root@k8scloude1 pod]# vim preferredDuringSchedule.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "60"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule.yaml
pod/pod1 created

可以看到pod运行在k8scloude2,因为k8scloude2标签为 xx=72,72大于60

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 13s 10.244.112.159 k8scloude2 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

创建pod,preferredDuringSchedulingIgnoredDuringExecution参数表示:节点最好具有一个键名为 xx 且取值大于 600 的标签。

[root@k8scloude1 pod]# vim preferredDuringSchedule1.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule1.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "600"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule1.yaml
pod/pod1 created

因为preferredDuringSchedulingIgnoredDuringExecution是软策略,尽管k8scloude2,k8scloude3都不满足xx>600,但是还是能成功创建pod

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 7s 10.244.251.213 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.5.3 节点亲和性权重

你可以为 preferredDuringSchedulingIgnoredDuringExecution 亲和性类型的每个实例设置 weight 字段,其取值范围是 1 到 100。 当调度器找到能够满足 Pod 的其他调度请求的节点时,调度器会遍历节点满足的所有的偏好性规则, 并将对应表达式的 weight 值加和。最终的加和值会添加到该节点的其他优先级函数的评分之上。 在调度器为 Pod 作出调度决定时,总分最高的节点的优先级也最高。

给节点打标签

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 yy=59
node/k8scloude2 labeled [root@k8scloude1 pod]# kubectl label nodes k8scloude3 yy=72
node/k8scloude3 labeled

创建pod,preferredDuringSchedulingIgnoredDuringExecution指定了2条软策略,但是权重不一样:weight: 2 和 weight: 10

[root@k8scloude1 pod]# vim preferredDuringSchedule2.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule2.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: pod1
name: pod1
namespace: pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 2
preference:
matchExpressions:
- key: xx
operator: Gt
values:
- "60"
- weight: 10
preference:
matchExpressions:
- key: yy
operator: Gt
values:
- "60"
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: pod1
resources: {}
ports:
- name: http
containerPort: 80
protocol: TCP
hostPort: 80
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {} [root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule2.yaml
pod/pod1 created

存在两个候选节点,因为yy>60这条规则的weight权重大,所以pod运行在k8scloude3

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod1 1/1 Running 0 10s 10.244.251.214 k8scloude3 <none> <none> [root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.6 Pod 拓扑分布约束

你可以使用 拓扑分布约束(Topology Spread Constraints) 来控制 Pod 在集群内故障域之间的分布, 故障域的示例有区域(Region)、可用区(Zone)、节点和其他用户自定义的拓扑域。 这样做有助于提升性能、实现高可用或提升资源利用率。

pod(八):pod的调度——将 Pod 指派给节点的更多相关文章

  1. kubernetes调度之pod优先级和资源抢占

    系列目录 Pod可以拥有优先级.优先意味着相对于其它pod某个pod更为重要.如果重要的pod不能被调度,则kubernetes调度器会优先于(驱离)低优先级的pod来让处于pending状态的高优先 ...

  2. K8S调度之pod亲和性

    目录 Pod Affinity Pod亲和性调度 pod互斥性调度 Pod Affinity 通过<K8S调度之节点亲和性>,我们知道怎么在调度的时候让pod灵活的选择node,但有些时候 ...

  3. pod(一):Kubernetes(k8s)创建pod的两种方式

    目录 一.系统环境 二.前言 三.pod 四.创建pod 4.1 环境介绍 4.2 使用命令行的方式创建pod 4.2.1 创建最简单的pod 4.2.2 创建pod,指定镜像下载策略 4.2.3 创 ...

  4. 我们可以定向调度某个pod在某个node上进行创建

    集群环境:1.k8s用的是二进制方式安装 2.操作系统是linux (centos)3.操作系统版本为 7.2/7.4/7.94.k8s的应用管理.node管理.pod管理等用rancher.k8s令 ...

  5. Java并发编程原理与实战三十八:多线程调度器(ScheduledThreadPoolExecutor)

    在前面介绍了java的多线程的基本原理信息:线程池的原理与使用 本文对这个java本身的线程池的调度器做一个简单扩展,如果还没读过上一篇文章,建议读一下,因为这是调度器的核心组件部分. 我们如果要用j ...

  6. Kubernetes-5.Pod资源控制器(1)

    docker version:20.10.2 kubernetes version:1.20.1 本文概述Kubernetes Pod资源控制器的ReplicaSet.Deployment.Daemo ...

  7. kubernetes调度之污点(taint)和容忍(toleration)

    系列目录 节点亲和性(affinity),是节点的一种属性,让符合条件的pod亲附于它(倾向于或者硬性要求).污点是一种相反的行为,它会使pod抗拒此节点(即pod调度的时候不被调度到此节点) 污点和 ...

  8. 理解 Kubernetes 的亲和性调度

    这次给大家介绍下k8s的亲和性调度:nodeSelector.nodeAffinity.podAffinity.Taints以及Tolerations用法. 一般情况下我们部署的 POD 是通过集群自 ...

  9. Kubernetes 调度器实现初探

    Kubernetes 调度器 Kubernetes 是一个基于容器的分布式调度器,实现了自己的调度模块.在Kubernetes集群中,调度器作为一个独立模块通过pod运行.从几个方面介绍Kuberne ...

随机推荐

  1. Python小游戏——外星人入侵(保姆级教程)第一章 06让飞船移动

    系列文章目录 第一章:武装飞船 06:让飞船移动 一.驾驶飞船 下面来让玩家能够左右移动飞船.我们将编写代码,在用户按左或右箭头键时做出响应.我们将首先专注于向右移动,再使用同样的原理来控制向左移动. ...

  2. 回溯、贪心、DP的区别和联系

    四大常用算法:分治.贪心.回溯.动态规划 回溯算法是个"万金油".基本上能用跟动态规划.贪心解决的问题,都可以用回溯去解决.回溯算法相当于穷举搜索,穷举所有情况,然后得到最优解.不 ...

  3. WAF对抗-安全狗(联合查询篇)

    WAF对抗-安全狗(联合查询篇) 实验环境 网站安全狗APACHE版V4.0.靶场:dvwa 为了方便对比可以在这个在线靶场申请一个dvwa https://www.vsplate.com/ mysq ...

  4. .Net+Vue3实现数据简易导入功能

    在开发的过程中,上传文件或者导入数据是一件很常见的事情,导入数据可以有两种方式: 前端上传文件到后台,后台读取文件内容,进行验证再进行存储 前端读取数据,进行数据验证,然后发送数据到后台进行存储 这两 ...

  5. 诺塔斯读写卡QT SDK笔记

    卡片操作函数调用 寻卡: Request --> LotusCardRequest 防撞处理: Anticollission --> LotusCardAnticoll 选卡: Selec ...

  6. class 中的 构造方法、static代码块、私有/公有/静态/实例属性、继承 ( extends、constructor、super()、static、super.prop、#prop、get、set )

     part 1         /**          * << class 中的 static 代码块与 super.prop 的使用          *          * - ...

  7. Java开发学习(三十)----Maven聚合和继承解析

    一.聚合 分模块开发后,需要将这四个项目都安装到本地仓库,目前我们只能通过项目Maven面板的install来安装,并且需要安装四个,如果我们的项目足够多,那么一个个安装起来还是比较麻烦的 如果四个项 ...

  8. docker学习笔记-容器相关命令

    新建并启动容器 docker pull centos (先下载镜像,如果没有直接使用docker run 命令会根据本地情况进行下载) # docker run [可选参数] image # 参数说明 ...

  9. Linux配置系统yum源

    首先是需要你把需要使用的镜像挂载到系统上面,可以通过cd /dvd添加也可以直接上传到系统上 本文档是上传到系统上进行挂载 操作系统:Red Hat 7.6 挂载镜像:Red Hat 7.6 1.挂载 ...

  10. Elasticsearch Reindex性能提升10倍+实战

    文章转载自: https://mp.weixin.qq.com/s?__biz=MzI2NDY1MTA3OQ==&mid=2247484134&idx=1&sn=750249a ...