一、概述

Kubernetes集群控制平面(Master)节点右数据库服务(Etcd)+其它服务组件(ApiserverController-managerScheduler等)组成;整个集群系统运行的交互数据都将存储到数据库服务(Etcd)中,所以Kubernetes集群的高可用性取决于数据库服务(Etcd)在多个控制平面(Master)节点构建的数据同步复制关系。由此搭建Kubernetes的高可用集群可以选择以下两种部署方式:

  • 使用堆叠的控制平面(Master)节点,其中etcd与组成控制平面的其他组件在同台机器上;
  • 使用外部Etcd节点,其中Etcd与控制平台的其他组件在不同的机器上。

参考文档:https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/high-availability/

1.1 堆叠Etcd拓扑(推荐)

Etcd与其他组件共同运行在多台控制平面(Master)机器上,构建Etcd集群关系以形成高可用的Kubernetes集群。

先决条件:

  • 最少三个或更多奇数Master节点;
  • 最少三个或更多Node节点;
  • 集群中所有机器之间的完整网络连接(公共或专用网络);
  • 使用超级用户权限;
  • 在集群中的任何一个节点上都可以使用SSH远程访问;
  • Kubeadm和Kubelet已经安装到机器上。

使用这种方案可以减少要使用机器的数量,降低成本,降低部署复杂度;多组件服务之间竞争主机资源,可能导致性能瓶颈,以及当Master主机发生故障时影响到所有组件正常工作。

在实际应用中,你可以选择部署更多数量>3的Master主机,则该拓扑的劣势将会减弱!

这是kubeadm中的默认拓扑,kubeadm会在Master节点上自动创建本地etcd成员。

1.2 外部Etcd拓扑

控制平面的Etcd组件运行在外部主机上,其他组件连接到外部的Etcd集群以形成高可用的Kubernetes集群。

先决条件:

  • 最少三个或更多奇数Master主机;
  • 最少三个或更多Node主机;
  • 还需要三台或更多奇数Etcd主机。
  • 集群中所有主机之间的完整网络连接(公共或专用网络);
  • 使用超级用户权限;
  • 在集群中的任何一个节点主机上都可以使用SSH远程访问;
  • Kubeadm和Kubelet已经安装到机器上。

使用外部主机搭建起来的Etcd集群,拥有更多的主机资源和可扩展性,以及故障影响范围缩小,但更多的机器将导致增加部署成本。

二、部署规划

主机系统:CentOS Linux release 7.7.1908 (Core)

Kubernetes版本:1.22.10

Docker CE版本:20.10.17

管理节点运行服务:etcd、kube-apiserver、kube-scheduler、kube-controller-manager、docker、kubelet、keepalived、haproxy

管理节点配置:4vCPU / 8GB内存 / 200G存储

主机名

主机地址

VIP地址

主机角色

k8s-master01

192.168.0.5

192.168.0.10

Master(Control Plane)

k8s-master02

192.168.0.6

Master(Control Plane)

k8s-master03

192.168.0.7

Master(Control Plane)

注:确保服务器为全新安装的系统,未安装其它软件仅用于Kubernetes运行。
可使用如下命令检查端口是否被占用:

  1. ss -alnupt |grep -E '6443|10250|10259|10257|2379|2380'
  2. ss -alnupt |grep -E '10250|3[0-2][0-7][0-6][0-7]'

三、搭建Kubernetes集群

3.1 内核升级(可选)

CentOS 7.x 版本的系统默认内核是3.10,该版本的内核在Kubernetes社区有很多已知的Bug(如:内核内存泄漏错误),建议升级成4.17+版本以上。

官方镜像仓库下载地址:http://mirrors.coreix.net/elrepo-archive-archive/kernel/el7/x86_64/RPMS/

  1. # 安装4.19.9-1版本内核
  2. $ rpm -ivh http://mirrors.coreix.net/elrepo-archive-archive/kernel/el7/x86_64/RPMS/kernel-ml-4.19.9-1.el7.elrepo.x86_64.rpm
  3. $ rpm -ivh http://mirrors.coreix.net/elrepo-archive-archive/kernel/el7/x86_64/RPMS/kernel-ml-devel-4.19.9-1.el7.elrepo.x86_64.rpm
  4. # 查看内核启动顺序
  5. $ awk -F \' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
  6. 0 : CentOS Linux (3.10.0-1062.12.1.el7.x86_64) 7 (Core)
  7. 1 : CentOS Linux (4.19.9-1.el7.elrepo.x86_64) 7 (Core)
  8. 2 : CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core)
  9. 3 : CentOS Linux (0-rescue-ef219b153e8049718c374985be33c24e) 7 (Core)
  10. # 设置系统启动默认内核
  11. $ grub2-set-default "CentOS Linux (4.19.9-1.el7.elrepo.x86_64) 7 (Core)"
  12. $ grub2-mkconfig -o /boot/grub2/grub.cfg
  13. # 查看默认内核
  14. $ grub2-editenv list
  15. CentOS Linux (4.19.9-1.el7.elrepo.x86_64) 7 (Core)
  16. # 重启系统使其生效
  17. $ reboot

3.2 系统初始化

3.2.1 设置主机名

  1. ### 在master01上执行
  2. $ hostnamectl set-hostname k8s-master01
  3. # 在master02上执行
  4. $ hostnamectl set-hostname k8s-master02
  5. # 在master03上执行
  6. $ hostnamectl set-hostname k8s-master03

3.2.2 添加hosts名称解析

  1. ### 在所有主机上执行
  2. $ cat >> /etc/hosts << EOF
  3. 192.168.0.5 k8s-master01
  4. 192.168.0.6 k8s-master02
  5. 192.168.0.7 k8s-master03
  6. EOF

3.2.3 安装常用软件

  1. ### 在所有主机上执行
  2. $ yum -y install epel-release.noarch nfs-utils net-tools bridge-utils \
  3. ntpdate vim chrony wget lrzsz

3.2.4 设置主机时间同步

在k8s-master01上设置从公共时间服务器上同步时间

  1. [root@k8s-master01 ~]# systemctl stop ntpd
  2. [root@k8s-master01 ~]# timedatectl set-timezone Asia/Shanghai
  3. [root@k8s-master01 ~]# ntpdate ntp.aliyun.com && /usr/sbin/hwclock
  4. [root@k8s-master01 ~]# vim /etc/ntp.conf
  5. # 当该节点丢失网络连接,采用本地时间作为时间服务器为集群中的其他节点提供时间同步
  6. server 127.127.1.0
  7. Fudge 127.127.1.0 stratum 10
  8. # 注释掉默认时间服务器,改为如下地址
  9. server cn.ntp.org.cn prefer iburst minpoll 4 maxpoll 10
  10. server ntp.aliyun.com iburst minpoll 4 maxpoll 10
  11. server time.ustc.edu.cn iburst minpoll 4 maxpoll 10
  12. server ntp.tuna.tsinghua.edu.cn iburst minpoll 4 maxpoll 10
  13. [root@k8s-master01 ~]# systemctl start ntpd
  14. [root@k8s-master01 ~]# systemctl enable ntpd
  15. [root@k8s-master01 ~]# ntpstat
  16. synchronised to NTP server (203.107.6.88) at stratum 3
  17. time correct to within 202 ms
  18. polling server every 64 s

配置其它主机从k8s-master01同步时间

  1. ### 在除k8s-master01以外的所有主机上执行
  2. $ systemctl stop ntpd
  3. $ timedatectl set-timezone Asia/Shanghai
  4. $ ntpdate k8s-master01 && /usr/sbin/hwclock
  5. $ vim /etc/ntp.conf
  6. # 注释掉默认时间服务器,改为如下地址
  7. server k8s-master01 prefer iburst minpoll 4 maxpoll 10
  8. $ systemctl start ntpd
  9. $ systemctl enable ntpd
  10. $ ntpstat
  11. synchronised to NTP server (192.168.0.5) at stratum 4
  12. time correct to within 217 ms
  13. polling server every 16 s

3.2.5 关闭防火墙

  1. ### 在所有节点上执行
  2. # 关闭SElinux
  3. $ sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
  4. $ setenforce 0
  5. # 关闭Fileworld防火墙
  6. $ systemctl stop firewalld.service
  7. $ systemctl disable firewalld.service

3.2.6 系统优化

  1. ### 在所有节点上执行
  2. # 关闭swap
  3. $ swapoff -a
  4. $ sed -i "s/^[^#].*swap/#&/g" /etc/fstab
  5. # 启用bridge-nf功能
  6. $ cat > /etc/modules-load.d/k8s.conf << EOF
  7. overlay
  8. br_netfilter
  9. EOF
  10. $ modprobe overlay && modprobe br_netfilter
  11. # 设置内核参数
  12. $ cat > /etc/sysctl.d/k8s.conf << EOF
  13. # 配置转发 IPv4 并让 iptables 看到桥接流量
  14. net.ipv4.ip_forward = 1
  15. net.bridge.bridge-nf-call-iptables = 1
  16. net.bridge.bridge-nf-call-ip6tables = 1
  17. # 加强握手队列能力
  18. net.ipv4.tcp_max_syn_backlog = 10240
  19. net.core.somaxconn = 10240
  20. net.ipv4.tcp_syncookies = 1
  21. # 调整系统级别的能够打开的文件句柄的数量
  22. fs.file-max=1000000
  23. # 配置arp cache 大小
  24. net.ipv4.neigh.default.gc_thresh1 = 1024
  25. net.ipv4.neigh.default.gc_thresh2 = 4096
  26. net.ipv4.neigh.default.gc_thresh3 = 8192
  27. # 令TCP窗口和状态追踪更加宽松
  28. net.netfilter.nf_conntrack_tcp_be_liberal = 1
  29. net.netfilter.nf_conntrack_tcp_loose = 1
  30. # 允许的最大跟踪连接条目,是在内核内存中netfilter可以同时处理的“任务”(连接跟踪条目)
  31. net.netfilter.nf_conntrack_max = 10485760
  32. net.netfilter.nf_conntrack_tcp_timeout_established = 300
  33. net.netfilter.nf_conntrack_buckets = 655360
  34. # 每个网络接口接收数据包的速率比内核处理这些包的速率快时,允许送到队列的数据包的最大数目。
  35. net.core.netdev_max_backlog = 10000
  36. # 默认值: 128 指定了每一个real user ID可创建的inotify instatnces的数量上限
  37. fs.inotify.max_user_instances = 524288
  38. # 默认值: 8192 指定了每个inotify instance相关联的watches的上限
  39. fs.inotify.max_user_watches = 524288
  40. EOF
  41. $ sysctl --system
  42. # 修改文件打开数
  43. $ ulimit -n 65545
  44. $ cat >> /etc/sysctl.d/limits.conf << EOF
  45. * soft nproc 65535
  46. * hard nproc 65535
  47. * soft nofile 65535
  48. * hard nofile 65535
  49. EOF
  50. $ sed -i '/nproc/ s/4096/65535/' /etc/security/limits.d/20-nproc.conf

3.3 安装Docker

  1. ### 在所有节点上执行
  2. # 安装Docker
  3. $ yum install -y yum-utils device-mapper-persistent-data lvm2
  4. $ yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  5. $ sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo && yum makecache fast
  6. $ yum -y install docker-ce-20.10.17
  7. # 优化docker配置
  8. $ mkdir -p /etc/docker && cat > /etc/docker/daemon.json <<EOF
  9. {
  10. "registry-mirrors": [
  11. "https://hub-mirror.c.163.coma",
  12. "https://docker.mirrors.ustc.edu.cn",
  13. "https://p6902cz5.mirror.aliyuncs.com"
  14. ],
  15. "exec-opts": ["native.cgroupdriver=systemd"],
  16. "log-driver": "json-file",
  17. "log-opts": {
  18. "max-size": "100m"
  19. },
  20. "storage-driver": "overlay2",
  21. "storage-opts": [
  22. "overlay2.override_kernel_check=true"
  23. ],
  24. "bip": "172.38.16.1/24"
  25. }
  26. EOF
  27. # 启动并配置开机自启
  28. $ systemctl enable docker
  29. $ systemctl restart docker
  30. $ docker version

3.4 安装Kubernetes

  1. ### 在所有Master节点执行
  2. # 配置yum源
  3. cat > /etc/yum.repos.d/kubernetes.repo <<EOF
  4. [kubernetes]
  5. name=Kubernetes
  6. baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
  7. enabled=1
  8. gpgcheck=1
  9. repo_gpgcheck=1
  10. gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
  11. EOF
  12. # 安装kubeadm、kubelet和kubectl
  13. $ yum install -y kubelet-1.22.10 kubeadm-1.22.10 kubectl-1.22.10 --disableexcludes=kubernetes --nogpgcheck
  14. $ systemctl enable --now kubelet
  15. # 配置kubelet参数
  16. $ cat > /etc/sysconfig/kubelet <<EOF
  17. KUBELET_EXTRA_ARGS="--fail-swap-on=false"
  18. EOF

可以参考:https://www.yuque.com/wubolive/ops/ugomse 修改kubeadm源码更改证书签发时长。

3.5 配置HA负载均衡

当存在多个控制平面时,kube-apiserver也存在多个,可以使用HAProxy+Keepalived这个组合,因为HAProxy可以提高更高性能的四层负载均衡功能。

官方文档提供了两种运行方式(此案例使用选项2):

  • 选项1:在操作系统上运行服务
  • 选项2:将服务作为静态pod运行

参考文档:https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#options-for-software-load-balancing

3.5.1 配置keepalived

keepalived作为静态pod运行,在引导过程中,kubelet将启动这些进程,以便集群可以在启动时使用它们。这是一个优雅的解决方案,特别是在堆叠(Stacked)etcd 拓扑下描述的设置。

创建keepalived.conf配置文件

  1. ### 在k8s-master01上设置:
  2. $ mkdir /etc/keepalived && cat > /etc/keepalived/keepalived.conf <<EOF
  3. ! /etc/keepalived/keepalived.conf
  4. ! Configuration File for keepalived
  5. global_defs {
  6. router_id k8s-master01
  7. }
  8. vrrp_script check_apiserver {
  9. script "/etc/keepalived/check_apiserver.sh"
  10. interval 3
  11. weight -2
  12. fall 10
  13. rise 2
  14. }
  15. vrrp_instance VI_1 {
  16. state MASTER
  17. interface eth0
  18. virtual_router_id 51
  19. priority 100
  20. authentication {
  21. auth_type PASS
  22. auth_pass 123456
  23. }
  24. virtual_ipaddress {
  25. 192.168.0.10
  26. }
  27. track_script {
  28. check_apiserver
  29. }
  30. }
  31. EOF
  32. ### 在k8s-master02上设置:
  33. $ mkdir /etc/keepalived && cat > /etc/keepalived/keepalived.conf <<EOF
  34. ! /etc/keepalived/keepalived.conf
  35. ! Configuration File for keepalived
  36. global_defs {
  37. router_id k8s-master02
  38. }
  39. vrrp_script check_apiserver {
  40. script "/etc/keepalived/check_apiserver.sh"
  41. interval 3
  42. weight -2
  43. fall 10
  44. rise 2
  45. }
  46. vrrp_instance VI_1 {
  47. state BACKUP
  48. interface eth0
  49. virtual_router_id 51
  50. priority 99
  51. authentication {
  52. auth_type PASS
  53. auth_pass 123456
  54. }
  55. virtual_ipaddress {
  56. 192.168.0.10
  57. }
  58. track_script {
  59. check_apiserver
  60. }
  61. }
  62. EOF
  63. ### 在k8s-master03上设置:
  64. $ mkdir /etc/keepalived && cat > /etc/keepalived/keepalived.conf <<EOF
  65. ! /etc/keepalived/keepalived.conf
  66. ! Configuration File for keepalived
  67. global_defs {
  68. router_id k8s-master03
  69. }
  70. vrrp_script check_apiserver {
  71. script "/etc/keepalived/check_apiserver.sh"
  72. interval 3
  73. weight -2
  74. fall 10
  75. rise 2
  76. }
  77. vrrp_instance VI_1 {
  78. state BACKUP
  79. interface eth0
  80. virtual_router_id 51
  81. priority 98
  82. authentication {
  83. auth_type PASS
  84. auth_pass 123456
  85. }
  86. virtual_ipaddress {
  87. 192.168.0.10
  88. }
  89. track_script {
  90. check_apiserver
  91. }
  92. }
  93. EOF

创建健康检查脚本

  1. ### 在所有Master控制节点上执行
  2. $ cat > /etc/keepalived/check_apiserver.sh << 'EOF'
  3. #!/bin/sh
  4. errorExit() {
  5. echo "*** $*" 1>&2
  6. exit 1
  7. }
  8. curl --silent --max-time 2 --insecure https://localhost:9443/ -o /dev/null || errorExit "Error GET https://localhost:9443/"
  9. if ip addr | grep -q 192.168.0.10; then
  10. curl --silent --max-time 2 --insecure https://192.168.0.10:9443/ -o /dev/null || errorExit "Error GET https://192.168.0.10:9443/"
  11. fi
  12. EOF

3.5.2 配置haproxy

  1. ### 在所有Master管理节点执行
  2. $ mkdir /etc/haproxy && cat > /etc/haproxy/haproxy.cfg << 'EOF'
  3. # /etc/haproxy/haproxy.cfg
  4. #---------------------------------------------------------------------
  5. # Global settings
  6. #---------------------------------------------------------------------
  7. global
  8. log /dev/log local0
  9. log /dev/log local1 notice
  10. daemon
  11. #---------------------------------------------------------------------
  12. # common defaults that all the 'listen' and 'backend' sections will
  13. # use if not designated in their block
  14. #---------------------------------------------------------------------
  15. defaults
  16. mode http
  17. log global
  18. option httplog
  19. option dontlognull
  20. option http-server-close
  21. option forwardfor except 127.0.0.0/8
  22. option redispatch
  23. retries 1
  24. timeout http-request 10s
  25. timeout queue 20s
  26. timeout connect 5s
  27. timeout client 20s
  28. timeout server 20s
  29. timeout http-keep-alive 10s
  30. timeout check 10s
  31. #---------------------------------------------------------------------
  32. # Haproxy Monitoring panel
  33. #---------------------------------------------------------------------
  34. listen admin_status
  35. bind 0.0.0.0:8888
  36. mode http
  37. log 127.0.0.1 local3 err
  38. stats refresh 5s
  39. stats uri /admin?stats
  40. stats realm itnihao\ welcome
  41. stats auth admin:admin
  42. stats hide-version
  43. stats admin if TRUE
  44. #---------------------------------------------------------------------
  45. # apiserver frontend which proxys to the control plane nodes
  46. #---------------------------------------------------------------------
  47. frontend apiserver
  48. bind *:9443
  49. mode tcp
  50. option tcplog
  51. default_backend apiserver
  52. #---------------------------------------------------------------------
  53. # round robin balancing for apiserver
  54. #---------------------------------------------------------------------
  55. backend apiserver
  56. option httpchk GET /healthz
  57. http-check expect status 200
  58. mode tcp
  59. option ssl-hello-chk
  60. balance roundrobin
  61. server k8s-master01 192.168.0.5:6443 check
  62. server k8s-master02 192.168.0.6:6443 check
  63. server k8s-master03 192.168.0.7:6443 check
  64. EOF

3.5.3 配置静态Pod运行

对于此设置,需要在其中创建两个清单文件/etc/kubernetes/manifests(首先创建目录)。

  1. ### 仅在k8s-master01上创建
  2. $ mkdir -p /etc/kubernetes/manifests
  3. # 配置keepalived清单
  4. $ cat > /etc/kubernetes/manifests/keepalived.yaml << 'EOF'
  5. apiVersion: v1
  6. kind: Pod
  7. metadata:
  8. creationTimestamp: null
  9. name: keepalived
  10. namespace: kube-system
  11. spec:
  12. containers:
  13. - image: osixia/keepalived:2.0.17
  14. name: keepalived
  15. resources: {}
  16. securityContext:
  17. capabilities:
  18. add:
  19. - NET_ADMIN
  20. - NET_BROADCAST
  21. - NET_RAW
  22. volumeMounts:
  23. - mountPath: /usr/local/etc/keepalived/keepalived.conf
  24. name: config
  25. - mountPath: /etc/keepalived/check_apiserver.sh
  26. name: check
  27. hostNetwork: true
  28. volumes:
  29. - hostPath:
  30. path: /etc/keepalived/keepalived.conf
  31. name: config
  32. - hostPath:
  33. path: /etc/keepalived/check_apiserver.sh
  34. name: check
  35. status: {}
  36. EOF
  37. # 配置haproxy清单
  38. cat > /etc/kubernetes/manifests/haproxy.yaml << 'EOF'
  39. apiVersion: v1
  40. kind: Pod
  41. metadata:
  42. name: haproxy
  43. namespace: kube-system
  44. spec:
  45. containers:
  46. - image: haproxy:2.1.4
  47. name: haproxy
  48. livenessProbe:
  49. failureThreshold: 8
  50. httpGet:
  51. host: localhost
  52. path: /healthz
  53. port: 9443
  54. scheme: HTTPS
  55. volumeMounts:
  56. - mountPath: /usr/local/etc/haproxy/haproxy.cfg
  57. name: haproxyconf
  58. readOnly: true
  59. hostNetwork: true
  60. volumes:
  61. - hostPath:
  62. path: /etc/haproxy/haproxy.cfg
  63. type: FileOrCreate
  64. name: haproxyconf
  65. status: {}
  66. EOF

3.6 部署Kubernetes集群

3.6.1 准备镜像

由于国内访问k8s.gcr.io存在某些原因下载不了镜像,所以我们可以在国内的镜像仓库中下载它们(比如使用阿里云镜像仓库。阿里云代理镜像仓库地址:registry.aliyuncs.com/google_containers

  1. ### 在所有Master控制节点执行
  2. $ kubeadm config images pull --kubernetes-version=v1.22.10 --image-repository=registry.aliyuncs.com/google_containers

3.6.2 准备ini配置文件

  1. ### 在k8s-master01上执行
  2. $ kubeadm config print init-defaults > kubeadm-init.yaml
  3. $ vim kubeadm-init.yaml
  4. apiVersion: kubeadm.k8s.io/v1beta3
  5. bootstrapTokens:
  6. - groups:
  7. - system:bootstrappers:kubeadm:default-node-token
  8. token: abcdef.0123456789abcdef
  9. ttl: 24h0m0s
  10. usages:
  11. - signing
  12. - authentication
  13. kind: InitConfiguration
  14. localAPIEndpoint:
  15. advertiseAddress: 192.168.0.5
  16. bindPort: 6443
  17. nodeRegistration:
  18. criSocket: /var/run/dockershim.sock
  19. imagePullPolicy: IfNotPresent
  20. name: k8s-master01
  21. taints: null
  22. ---
  23. controlPlaneEndpoint: "192.168.0.10:9443"
  24. apiServer:
  25. timeoutForControlPlane: 4m0s
  26. apiVersion: kubeadm.k8s.io/v1beta3
  27. certificatesDir: /etc/kubernetes/pki
  28. clusterName: kubernetes
  29. controllerManager: {}
  30. dns: {}
  31. etcd:
  32. local:
  33. dataDir: /var/lib/etcd
  34. imageRepository: registry.aliyuncs.com/google_containers
  35. kind: ClusterConfiguration
  36. kubernetesVersion: 1.22.10
  37. networking:
  38. dnsDomain: cluster.local
  39. serviceSubnet: 10.96.0.0/12
  40. scheduler: {}

配置说明:

  • localAPIEndpoint.advertiseAddress:本机apiserver监听的IP地址。
  • localAPIEndpoint.bindPort:本机apiserver监听的端口。
  • controlPlaneEndpoint:控制平面入口点地址(负载均衡器VIP地址+负载均衡器端口)。
  • imageRepository:部署集群时要使用的镜像仓库地址。
  • kubernetesVersion:部署集群的kubernetes版本。

3.6.3 初始化控制平面节点

kubeadm在初始化控制平面时会生成部署Kubernetes集群中各个组件所需的相关配置文件在/etc/kubernetes目录下,可以供我们参考。

  1. ### 在k8s-master01上执行
  2. # 由于kubeadm命令为源码安装,需要配置一下kubelet服务。
  3. $ kubeadm init phase kubelet-start --config kubeadm-init.yaml
  4. # 初始化kubernetes控制平面
  5. $ kubeadm init --config kubeadm-init.yaml --upload-certs
  6. Your Kubernetes control-plane has initialized successfully!
  7. To start using your cluster, you need to run the following as a regular user:
  8. mkdir -p $HOME/.kube
  9. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  10. sudo chown $(id -u):$(id -g) $HOME/.kube/config
  11. Alternatively, if you are the root user, you can run:
  12. export KUBECONFIG=/etc/kubernetes/admin.conf
  13. You should now deploy a pod network to the cluster.
  14. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  15. https://kubernetes.io/docs/concepts/cluster-administration/addons/
  16. You can now join any number of the control-plane node running the following command on each as root:
  17. kubeadm join 192.168.0.10:9443 --token abcdef.0123456789abcdef \
  18. --discovery-token-ca-cert-hash sha256:b30e986e80423da7b6b1cbf43ece58598074b2a8b86295517438942e9a47ab0d \
  19. --control-plane --certificate-key 57360054608fa9978864124f3195bc632454be4968b5ccb577f7bb9111d96597
  20. Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
  21. As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
  22. "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
  23. Then you can join any number of worker nodes by running the following on each as root:
  24. kubeadm join 192.168.0.10:9443 --token abcdef.0123456789abcdef \
  25. --discovery-token-ca-cert-hash sha256:b30e986e80423da7b6b1cbf43ece58598074b2a8b86295517438942e9a47ab0d

3.6.4 将其它节点加入集群

将控制平面节点加入集群

  1. ### 在另外两台Master控制节点执行:
  2. $ kubeadm join 192.168.0.10:9443 --token abcdef.0123456789abcdef \
  3. --discovery-token-ca-cert-hash sha256:b30e986e80423da7b6b1cbf43ece58598074b2a8b86295517438942e9a47ab0d \
  4. --control-plane --certificate-key 57360054608fa9978864124f3195bc632454be4968b5ccb577f7bb9111d96597

将工作节点加入集群(可选)

  1. ### 如有Node工作节点可使用如下命令
  2. $ kubeadm join 192.168.0.10:9443 --token abcdef.0123456789abcdef \
  3. --discovery-token-ca-cert-hash sha256:b30e986e80423da7b6b1cbf43ece58598074b2a8b86295517438942e9a47ab0d

将keepalived和haproxy复制到其它Master控制节点

  1. $ scp /etc/kubernetes/manifests/{haproxy.yaml,keepalived.yaml} root@k8s-master02:/etc/kubernetes/manifests/
  2. $ scp /etc/kubernetes/manifests/{haproxy.yaml,keepalived.yaml} root@k8s-master03:/etc/kubernetes/manifests/

去掉master污点(可选)

  1. $ kubectl taint nodes --all node-role.kubernetes.io/master-

3.6.5 验证集群状态

  1. ### 可在任意Master控制节点执行
  2. # 配置kubectl认证
  3. $ mkdir -p $HOME/.kube
  4. $ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  5. # 查看节点状态
  6. $ kubectl get nodes
  7. NAME STATUS ROLES AGE VERSION
  8. k8s-master01 NotReady control-plane,master 13m v1.22.10
  9. k8s-master02 NotReady control-plane,master 3m55s v1.22.10
  10. k8s-master03 NotReady control-plane,master 113s v1.22.10
  11. # 查看pod状态
  12. $ kubectl get pod -n kube-system
  13. NAMESPACE NAME READY STATUS RESTARTS AGE
  14. kube-system coredns-7f6cbbb7b8-96hp9 0/1 Pending 0 18m
  15. kube-system coredns-7f6cbbb7b8-kfmnn 0/1 Pending 0 18m
  16. kube-system etcd-k8s-master01 1/1 Running 0 18m
  17. kube-system etcd-k8s-master02 1/1 Running 0 9m21s
  18. kube-system etcd-k8s-master03 1/1 Running 0 7m18s
  19. kube-system haproxy-k8s-master01 1/1 Running 0 18m
  20. kube-system haproxy-k8s-master02 1/1 Running 0 3m27s
  21. kube-system haproxy-k8s-master03 1/1 Running 0 3m16s
  22. kube-system keepalived-k8s-master01 1/1 Running 0 18m
  23. kube-system keepalived-k8s-master02 1/1 Running 0 3m27s
  24. kube-system keepalived-k8s-master03 1/1 Running 0 3m16s
  25. kube-system kube-apiserver-k8s-master01 1/1 Running 0 18m
  26. kube-system kube-apiserver-k8s-master02 1/1 Running 0 9m24s
  27. kube-system kube-apiserver-k8s-master03 1/1 Running 0 7m23s
  28. kube-system kube-controller-manager-k8s-master01 1/1 Running 0 18m
  29. kube-system kube-controller-manager-k8s-master02 1/1 Running 0 9m24s
  30. kube-system kube-controller-manager-k8s-master03 1/1 Running 0 7m22s
  31. kube-system kube-proxy-cvdlr 1/1 Running 0 7m23s
  32. kube-system kube-proxy-gnl7t 1/1 Running 0 9m25s
  33. kube-system kube-proxy-xnrt7 1/1 Running 0 18m
  34. kube-system kube-scheduler-k8s-master01 1/1 Running 0 18m
  35. kube-system kube-scheduler-k8s-master02 1/1 Running 0 9m24s
  36. kube-system kube-scheduler-k8s-master03 1/1 Running 0 7m22s
  37. # 查看kubernetes证书有效期
  38. $ kubeadm certs check-expiration
  39. CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
  40. admin.conf Oct 25, 2122 07:40 UTC 99y ca no
  41. apiserver Oct 25, 2122 07:40 UTC 99y ca no
  42. apiserver-etcd-client Oct 25, 2122 07:40 UTC 99y etcd-ca no
  43. apiserver-kubelet-client Oct 25, 2122 07:40 UTC 99y ca no
  44. controller-manager.conf Oct 25, 2122 07:40 UTC 99y ca no
  45. etcd-healthcheck-client Oct 25, 2122 07:40 UTC 99y etcd-ca no
  46. etcd-peer Oct 25, 2122 07:40 UTC 99y etcd-ca no
  47. etcd-server Oct 25, 2122 07:40 UTC 99y etcd-ca no
  48. front-proxy-client Oct 25, 2122 07:40 UTC 99y front-proxy-ca no
  49. scheduler.conf Oct 25, 2122 07:40 UTC 99y ca no
  50. CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
  51. ca Oct 22, 2032 07:40 UTC 99y no
  52. etcd-ca Oct 22, 2032 07:40 UTC 99y no
  53. front-proxy-ca Oct 22, 2032 07:40 UTC 99y no

查看HAProxy控制台集群状态

访问:http://192.168.0.10:8888/admin?stats 账号密码都为admin

3.6.6 安装CNA插件(calico)

Calico是一个开源的虚拟化网络方案,支持基础的Pod网络通信和网络策略功能。

官方文档:https://projectcalico.docs.tigera.io/getting-started/kubernetes/quickstart

  1. ### 在任意Master控制节点执行
  2. # 下载最新版本编排文件
  3. $ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
  4. # 下载指定版本编排文件(可选)
  5. $ curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.0/manifests/calico.yaml -O
  6. # 部署calico
  7. $ kubectl apply -f calico.yaml
  8. # 验证安装
  9. $ kubectl get pod -n kube-system | grep calico
  10. calico-kube-controllers-86c9c65c67-j7pv4 1/1 Running 0 17m
  11. calico-node-8mzpk 1/1 Running 0 17m
  12. calico-node-tkzs2 1/1 Running 0 17m
  13. calico-node-xbwvp 1/1 Running 0 17m

四、集群优化及组件安装

4.1 集群优化

4.1.1 修改NodePort端口范围(可选)

  1. ### 在所有Master管理节点执行
  2. $ sed -i '/- --secure-port=6443/a\ - --service-node-port-range=1-32767' /etc/kubernetes/manifests/kube-apiserver.yaml

4.1.2 解决kubectl get cs显示异常问题

  1. ### 在所有Master管理节点执行
  2. $ sed -i 's/^[^#].*--port=0$/#&/g' /etc/kubernetes/manifests/{kube-scheduler.yaml,kube-controller-manager.yaml}
  3. # 验证
  4. $ kubectl get cs
  5. Warning: v1 ComponentStatus is deprecated in v1.19+
  6. NAME STATUS MESSAGE ERROR
  7. scheduler Healthy ok
  8. controller-manager Healthy ok
  9. etcd-0 Healthy {"health":"true","reason":""}

4.1.3 解决调度器监控不显示问题

  1. ### 在所有Master管理节点执行
  2. $ sed -i 's#bind-address=127.0.0.1#bind-address=0.0.0.0#g' /etc/kubernetes/manifests/kube-controller-manager.yaml
  3. $ sed -i 's#bind-address=127.0.0.1#bind-address=0.0.0.0#g' /etc/kubernetes/manifests/kube-scheduler.yaml

4.2 安装Metric-Server

指标服务Metrices-Server是Kubernetes中的一个常用插件,它类似于Top命令,可以查看Kubernetes中Node和Pod的CPU和内存资源使用情况。Metrices-Server每15秒收集一次指标,它在集群中的每个节点中运行,可扩展支持多达5000个节点的集群。

参考文档:https://github.com/kubernetes-sigs/metrics-server

  1. ### 在任意Master管理节点执行
  2. $ wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server.yaml
  3. # 修改配置
  4. $ vim metrics-server.yaml
  5. ......
  6. spec:
  7. containers:
  8. - args:
  9. - --cert-dir=/tmp
  10. - --secure-port=4443
  11. - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
  12. - --kubelet-use-node-status-port
  13. - --metric-resolution=15s
  14. - --kubelet-insecure-tls # 不要验证由Kubelets提供的CA或服务证书。
  15. image: bitnami/metrics-server:0.6.1 # 修改成docker.io镜像
  16. imagePullPolicy: IfNotPresent
  17. ......
  18. # 部署metrics-server
  19. $ kubectl apply -f metrics-server.yaml
  20. # 查看启动状态
  21. $ kubectl get pod -n kube-system -l k8s-app=metrics-server -w
  22. NAME READY STATUS RESTARTS AGE
  23. metrics-server-655d65c95-lvb7z 1/1 Running 0 103s
  24. # 查看集群资源状态
  25. $ kubectl top nodes
  26. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  27. k8s-master01 193m 4% 2144Mi 27%
  28. k8s-master02 189m 4% 1858Mi 23%
  29. k8s-master03 268m 6% 1934Mi 24%

五、附录

5.1 重置节点(危险操作)

当在使用kubeadm initkubeadm join部署节点出现失败状况时,可以使用以下操作对节点进行重置!

注:重置会将节点恢复到未部署前状态,若集群已正常工作则无需重置,否则将引起不可恢复的集群故障!

  1. $ kubeadm reset -f
  2. $ ipvsadm --clear
  3. $ iptables -F && iptables -X && iptables -Z

5.2 常用查询命令

  1. # 查看Token列表
  2. $ kubeadm token list
  3. TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
  4. abcdef.0123456789abcdef 22h 2022-10-26T07:43:01Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
  5. jgqg88.6mskuadei41o0s2d 40m 2022-10-25T09:43:01Z <none> Proxy for managing TTL for the kubeadm-certs secret <none>
  6. # 查询节点运行状态
  7. $ kubectl get nodes
  8. NAME STATUS ROLES AGE VERSION
  9. k8s-master01 Ready control-plane,master 81m v1.22.10
  10. k8s-master02 Ready control-plane,master 71m v1.22.10
  11. k8s-master03 Ready control-plane,master 69m v1.22.10
  12. # 查看证书到期时间
  13. $ kubeadm certs check-expiration
  14. CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
  15. admin.conf Oct 25, 2122 07:40 UTC 99y ca no
  16. apiserver Oct 25, 2122 07:40 UTC 99y ca no
  17. apiserver-etcd-client Oct 25, 2122 07:40 UTC 99y etcd-ca no
  18. apiserver-kubelet-client Oct 25, 2122 07:40 UTC 99y ca no
  19. controller-manager.conf Oct 25, 2122 07:40 UTC 99y ca no
  20. etcd-healthcheck-client Oct 25, 2122 07:40 UTC 99y etcd-ca no
  21. etcd-peer Oct 25, 2122 07:40 UTC 99y etcd-ca no
  22. etcd-server Oct 25, 2122 07:40 UTC 99y etcd-ca no
  23. front-proxy-client Oct 25, 2122 07:40 UTC 99y front-proxy-ca no
  24. scheduler.conf Oct 25, 2122 07:40 UTC 99y ca no
  25. CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
  26. ca Oct 22, 2032 07:40 UTC 99y no
  27. etcd-ca Oct 22, 2032 07:40 UTC 99y no
  28. front-proxy-ca Oct 22, 2032 07:40 UTC 99y no
  29. # 查看kubeadm初始化控制平面配置信息
  30. $ kubeadm config print init-defaults
  31. apiVersion: kubeadm.k8s.io/v1beta3
  32. bootstrapTokens:
  33. - groups:
  34. - system:bootstrappers:kubeadm:default-node-token
  35. token: abcdef.0123456789abcdef
  36. ttl: 24h0m0s
  37. usages:
  38. - signing
  39. - authentication
  40. kind: InitConfiguration
  41. localAPIEndpoint:
  42. advertiseAddress: 1.2.3.4
  43. bindPort: 6443
  44. nodeRegistration:
  45. criSocket: /var/run/dockershim.sock
  46. imagePullPolicy: IfNotPresent
  47. name: node
  48. taints: null
  49. ---
  50. apiServer:
  51. timeoutForControlPlane: 4m0s
  52. apiVersion: kubeadm.k8s.io/v1beta3
  53. certificatesDir: /etc/kubernetes/pki
  54. clusterName: kubernetes
  55. controllerManager: {}
  56. dns: {}
  57. etcd:
  58. local:
  59. dataDir: /var/lib/etcd
  60. imageRepository: k8s.gcr.io
  61. kind: ClusterConfiguration
  62. kubernetesVersion: 1.22.0
  63. networking:
  64. dnsDomain: cluster.local
  65. serviceSubnet: 10.96.0.0/12
  66. scheduler: {}
  67. # 查看kube-system空间Pod运行状态
  68. $ kubectl get pod --namespace=kube-system
  69. NAME READY STATUS RESTARTS AGE
  70. calico-kube-controllers-86c9c65c67-j7pv4 1/1 Running 0 47m
  71. calico-node-8mzpk 1/1 Running 0 47m
  72. calico-node-tkzs2 1/1 Running 0 47m
  73. calico-node-xbwvp 1/1 Running 0 47m
  74. coredns-7f6cbbb7b8-96hp9 1/1 Running 0 82m
  75. coredns-7f6cbbb7b8-kfmnn 1/1 Running 0 82m
  76. etcd-k8s-master01 1/1 Running 0 82m
  77. etcd-k8s-master02 1/1 Running 0 72m
  78. etcd-k8s-master03 1/1 Running 0 70m
  79. haproxy-k8s-master01 1/1 Running 0 36m
  80. haproxy-k8s-master02 1/1 Running 0 67m
  81. haproxy-k8s-master03 1/1 Running 0 66m
  82. keepalived-k8s-master01 1/1 Running 0 82m
  83. keepalived-k8s-master02 1/1 Running 0 67m
  84. keepalived-k8s-master03 1/1 Running 0 66m
  85. kube-apiserver-k8s-master01 1/1 Running 0 82m
  86. kube-apiserver-k8s-master02 1/1 Running 0 72m
  87. kube-apiserver-k8s-master03 1/1 Running 0 70m
  88. kube-controller-manager-k8s-master01 1/1 Running 0 23m
  89. kube-controller-manager-k8s-master02 1/1 Running 0 23m
  90. kube-controller-manager-k8s-master03 1/1 Running 0 23m
  91. kube-proxy-cvdlr 1/1 Running 0 70m
  92. kube-proxy-gnl7t 1/1 Running 0 72m
  93. kube-proxy-xnrt7 1/1 Running 0 82m
  94. kube-scheduler-k8s-master01 1/1 Running 0 23m
  95. kube-scheduler-k8s-master02 1/1 Running 0 23m
  96. kube-scheduler-k8s-master03 1/1 Running 0 23m
  97. metrics-server-5786d84b7c-5v4rv 1/1 Running 0 8m38s

部署Kubernetes v1.22.10高可用集群的更多相关文章

  1. kubeadm使用外部etcd部署kubernetes v1.17.3 高可用集群

    文章转载自:https://mp.weixin.qq.com/s?__biz=MzI1MDgwNzQ1MQ==&mid=2247483891&idx=1&sn=17dcd7cd ...

  2. kubeadm 使用 Calico CNI 以及外部 etcd 部署 kubernetes v1.23.1 高可用集群

    文章转载自:https://mp.weixin.qq.com/s/2sWHt6SeCf7GGam0LJEkkA 一.环境准备 使用服务器 Centos 8.4 镜像,默认操作系统版本 4.18.0-3 ...

  3. 使用睿云智合开源 Breeze 工具部署 Kubernetes v1.12.3 高可用集群

    一.Breeze简介 Breeze 项目是深圳睿云智合所开源的Kubernetes 图形化部署工具,大大简化了Kubernetes 部署的步骤,其最大亮点在于支持全离线环境的部署,且不需要FQ获取 G ...

  4. K8S学习笔记之二进制部署Kubernetes v1.13.4 高可用集群

    0x00 概述 本次采用二进制文件方式部署,本文过程写成了更详细更多可选方案的ansible部署方案 https://github.com/zhangguanzhang/Kubernetes-ansi ...

  5. lvs+keepalived部署k8s v1.16.4高可用集群

    一.部署环境 1.1 主机列表 主机名 Centos版本 ip docker version flannel version Keepalived version 主机配置 备注 lvs-keepal ...

  6. Centos7.6部署k8s v1.16.4高可用集群(主备模式)

    一.部署环境 主机列表: 主机名 Centos版本 ip docker version flannel version Keepalived version 主机配置 备注 master01 7.6. ...

  7. 使用开源Breeze工具部署Kubernetes 1.12.1高可用集群

    Breeze项目是深圳睿云智合所开源的Kubernetes图形化部署工具,大大简化了Kubernetes部署的步骤,其最大亮点在于支持全离线环境的部署,且不需要FQ获取Google的相应资源包,尤其适 ...

  8. Breeze 部署 Kubernetes 1.12.1高可用集群

    今天看文章介绍了一个开源部署 K8S 的工具,有空研究下~ Github 地址: https://github.com/wise2c-devops/breeze

  9. kubernetes之手动部署k8s 1.14.1高可用集群

    1. 架构信息 系统版本:CentOS 7.6 内核:3.10.0-957.el7.x86_64 Kubernetes: v1.14.1 Docker-ce: 18.09.5 推荐硬件配置:4核8G ...

  10. 部署kubernetes1.8.4+contiv高可用集群

    原理和架构图参考上一篇,这里只记录操作步骤.由于东西较多,篇幅也会较长. etcd version: 3.2.11 kube version: 1.8.4 contiv version: 1.1.7 ...

随机推荐

  1. redisson分布式锁原理剖析

    redisson分布式锁原理剖析 ​ 相信使用过redis的,或者正在做分布式开发的童鞋都知道redisson组件,它的功能很多,但我们使用最频繁的应该还是它的分布式锁功能,少量的代码,却实现了加锁. ...

  2. php7怎么安装memcache扩展

    php7安装memcache扩展 1.下载文件,解压缩 memcache windows php7下载地址: https://github.com/nono303/PHP7-memcache-dll ...

  3. 关于CSDN微信登录接口的研究

    代码 import requests import re from threading import Thread import time import requests from io import ...

  4. 关于Wegame页面空白的问题解决

    前言 前几天帮亲戚家装电脑系统,装好后发现 wegame 所有页面都不能正确加载(全部是空白页面),很神奇,在网上找了很多种解决办法都没有效果,后来不过细心的我发现360浏览器一直提示我证书不安全过期 ...

  5. C# Math 中的常用的数学运算

    〇.动态库 System.Math.dll 引入动态库 using System.Math;   Math 为通用数学函数.对数函数.三角函数等提供常数和静态方法,使用起来非常方便,下边简单列一下常用 ...

  6. oracle第二步创建表空间、用户、授权

    Windows+r→键入sqlplus,输入已安装好的oracle数据库超级管理员账号密码登录.显示: 成功. 创建表空间: 创建用户并默认表空间: 授权该创建用户对数据库的操作: 代码: SQL&g ...

  7. 【每日一题】【map存值】2022年2月25日-NC112 进制转换

    描述给定一个十进制数 M ,以及需要转换的进制数 N .将十进制数 M 转化为 N 进制数. 当 N 大于 10 以后, 应在结果中使用大写字母表示大于 10 的一位,如 'A' 表示此位为 10 , ...

  8. 【py模板】missingno画缺失直观图,matplotlib和sns画箱线图

    import missingno as msn import pandas as pd train = pd.read_csv('cupHaveHead1.csv') msn.matrix(train ...

  9. 【深入浅出SpringCloud原理及实战】「SpringCloud-Alibaba系列」微服务模式搭建系统基础架构实战指南及版本规划踩坑分析

    Spring Cloud Alibaba Nacos Discovery Spring Boot 应用程序在服务注册与发现方面提供和 Nacos 的无缝集成. 通过一些简单的注解,您可以快速来注册一个 ...

  10. 电脑无法自动获取ip地址

    1.按下win+r,输入cmd,打开命令提示符;2.执行ipconfig命令看下能否获取到ip地址:3.若不能,执行ipconfig /renew命令重新获取ip:4.执行ipconfig命令看下能否 ...