kubernetes-集群备份和恢复
[root@master ~]# ETCDCTL_API= etcdctl snapshot save /backup/k8s1/snapshot.db
Snapshot saved at /backup/k8s1/snapshot.db
[root@master ~]# du -h /backup/k8s1/snapshot.db
1.6M /backup/k8s1/snapshot.db
3:拷贝kubernetes目录下ssl文件
[root@master ~]# cp /etc/kubernetes/ssl/* /backup/k8s1/
[root@master ~]# ll /backup/k8s1/
总用量 1628
-rw-r--r--. 1 root root 1675 12月 10 21:21 admin-key.pem
-rw-r--r--. 1 root root 1391 12月 10 21:21 admin.pem
-rw-r--r--. 1 root root 997 12月 10 21:21 aggregator-proxy.csr
-rw-r--r--. 1 root root 219 12月 10 21:21 aggregator-proxy-csr.json
-rw-------. 1 root root 1675 12月 10 21:21 aggregator-proxy-key.pem
-rw-r--r--. 1 root root 1383 12月 10 21:21 aggregator-proxy.pem
-rw-r--r--. 1 root root 294 12月 10 21:21 ca-config.json
-rw-r--r--. 1 root root 1675 12月 10 21:21 ca-key.pem
-rw-r--r--. 1 root root 1350 12月 10 21:21 ca.pem
-rw-r--r--. 1 root root 1082 12月 10 21:21 kubelet.csr
-rw-r--r--. 1 root root 283 12月 10 21:21 kubelet-csr.json
-rw-------. 1 root root 1675 12月 10 21:21 kubelet-key.pem
-rw-r--r--. 1 root root 1452 12月 10 21:21 kubelet.pem
-rw-r--r--. 1 root root 1273 12月 10 21:21 kubernetes.csr
-rw-r--r--. 1 root root 488 12月 10 21:21 kubernetes-csr.json
-rw-------. 1 root root 1679 12月 10 21:21 kubernetes-key.pem
-rw-r--r--. 1 root root 1639 12月 10 21:21 kubernetes.pem
-rw-r--r--. 1 root root 1593376 12月 10 21:32 snapshot.db
4:模拟集群崩溃,执行clean.yml清除操作
[root@master ~]# cd /etc/ansible/
[root@master ansible]# ansible-playbook .clean.yml
[root@master ansible]# ansible-playbook .prepare.yml
[root@master ansible]# ansible-playbook .etcd.yml
[root@master ansible]# ansible-playbook .docker.yml
[root@master ansible]# ansible-playbook .kube-master.yml
[root@master ansible]# ansible-playbook .kube-node.yml
3:暂停etcd服务
[root@master ansible]# ansible etcd -m service -a 'name=etcd state=stopped'
4:清空数据
[root@master ansible]# ansible etcd -m file -a 'name=/var/lib/etcd/member/ state=absent'
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user
configurable on deprecation. This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details 192.168.1.203 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"path": "/var/lib/etcd/member/",
"state": "absent"
}
192.168.1.202 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"path": "/var/lib/etcd/member/",
"state": "absent"
}
192.168.1.200 | CHANGED => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python"
},
"changed": true,
"path": "/var/lib/etcd/member/",
"state": "absent"
}
4:将备份的etcd数据文件同步到每个etcd节点上
[root@master ansible]# for i in ; do rsync -av /backup/k8s1 192.168..$i:/backup/; done
sending incremental file list
created directory /backup
k8s1/
k8s1/admin-key.pem
k8s1/admin.pem
k8s1/aggregator-proxy-csr.json
k8s1/aggregator-proxy-key.pem
k8s1/aggregator-proxy.csr
k8s1/aggregator-proxy.pem
k8s1/ca-config.json
k8s1/ca-key.pem
k8s1/ca.pem
k8s1/kubelet-csr.json
k8s1/kubelet-key.pem
k8s1/kubelet.csr
k8s1/kubelet.pem
k8s1/kubernetes-csr.json
k8s1/kubernetes-key.pem
k8s1/kubernetes.csr
k8s1/kubernetes.pem
k8s1/snapshot.db sent ,, bytes received bytes ,239.60 bytes/sec
total size is ,, speedup is 1.00
sending incremental file list
created directory /backup
k8s1/
k8s1/admin-key.pem
k8s1/admin.pem
k8s1/aggregator-proxy-csr.json
k8s1/aggregator-proxy-key.pem
k8s1/aggregator-proxy.csr
k8s1/aggregator-proxy.pem
k8s1/ca-config.json
k8s1/ca-key.pem
k8s1/ca.pem
k8s1/kubelet-csr.json
k8s1/kubelet-key.pem
k8s1/kubelet.csr
k8s1/kubelet.pem
k8s1/kubernetes-csr.json
k8s1/kubernetes-key.pem
k8s1/kubernetes.csr
k8s1/kubernetes.pem
k8s1/snapshot.db sent ,, bytes received bytes ,,066.00 bytes/sec
total size is ,, speedup is 1.00
5:在每个etcd节点执行下面数据恢复操作,然后重启etcd
##说明:在/etc/systemd/system/etcd.service找到--inital-cluster etcd1=https://xxxx:2380,etcd2=https://xxxx:2380,etcd3=https://xxxx:2380替换恢复命令中的--initial-cluster{ }变量,--name=【当前etcd-node-name】,最后还需要填写当前节点的IP:2380
①【deploy操作】
[root@master ansible]# cd /backup/k8s1/
[root@master k8s1]# ETCDCTL_API= etcdctl snapshot restore snapshot.db --name etcd1 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.200:2380
-- ::50.037127 I | mvcc: restore compact to
-- ::50.052409 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
-- ::50.052451 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
-- ::50.052474 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
执行上面步骤后,会在当前节点目录下,生成一个【node-name】.etcd目录文件
[root@master k8s1]# tree etcd1.etcd/
etcd1.etcd/
└── member
├── snap
│ ├── -.snap
│ └── db
└── wal
└── -.wal
[root@master k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
[root@master k8s1]# systemctl restart etcd
②【etcd2节点操作】
[root@node1 ~]# cd /backup/k8s1/
[root@node1 k8s1]# ETCDCTL_API= etcdctl snapshot restore snapshot.db --name etcd2 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.202:2380
-- ::35.175032 I | mvcc: restore compact to
-- ::35.232386 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
-- ::35.232507 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
-- ::35.232541 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
[root@node1 k8s1]# tree etcd2.etcd/
etcd2.etcd/
└── member
├── snap
│ ├── -.snap
│ └── db
└── wal
└── -.wal
[root@node1 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
[root@node1 k8s1]# systemctl restart etcd
③【etcd3节点操作】
[root@node2 ~]# cd /backup/k8s1/
[root@node2 k8s1]# ETCDCTL_API= etcdctl snapshot restore snapshot.db --name etcd3 --initial-cluster etcd1=https://192.168.1.200:2380,etcd2=https://192.168.1.202:2380,etcd3=https://192.168.1.203:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.1.203:2380
-- ::55.943364 I | mvcc: restore compact to
-- ::55.988674 I | etcdserver/membership: added member 12229714d8728d0e [https://192.168.1.200:2380] to cluster b8ef796b710cde7d
-- ::55.988726 I | etcdserver/membership: added member 552fb05951af50c9 [https://192.168.1.203:2380] to cluster b8ef796b710cde7d
-- ::55.988754 I | etcdserver/membership: added member 8b4f4a6559bf7c2c [https://192.168.1.202:2380] to cluster b8ef796b710cde7d
[root@node2 k8s1]# tree etcd3.etcd/
etcd3.etcd/
└── member
├── snap
│ ├── -.snap
│ └── db
└── wal
└── -.wa
[root@node2 k8s1]# cp -r etcd1.etcd/member /var/lib/etcd/
[root@node2 k8s1]# systemctl restart etcd
6:在deploy节点上操作重建网络
[root@master ansible]# cd /etc/ansible/
[root@master ansible]# ansible-playbook tools/change_k8s_network.yml
7:查看pod、svc恢复是否成功
[root@master ansible]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 <none> /TCP 5d5h
nginx ClusterIP 10.68.241.175 <none> /TCP 5d4h
tomcat ClusterIP 10.68.235.35 <none> /TCP 76m
[root@master ansible]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-7c45b84548-4998z / Running 5d4h
tomcat-8fc9f5995-9kl5b / Running 77m
三、自动备份、自动恢复
1:一键备份
[root@master ansible]# ansible-playbook /etc/ansible/.backup.yml
2:模拟故障
[root@master ansible]# ansible-playbook /etc/ansible/.clean.yml
修改文件/etc/ansible/roles/cluster-restore/defaults/main.yml,指定要恢复的etcd快照备份,如果不修改就是最新的一次
3:执行自动恢复操作
[root@master ansible]# ansible-playbook /etc/ansible/.restore.yml
[root@master ansible]# ansible-playbook /etc/ansible/tools/change_k8s_network.yml
kubernetes-集群备份和恢复的更多相关文章
- etcd v3集群备份和恢复
官方文档 https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md 一.运行3个etcd节点 我们用 ...
- kubernetes集群断电后etcd启动失败之etcd备份方案
一.问题描述 二进制部署的单Master节点的v1.13.10版本的集群,etcd部署的是3.3.10版本,部署在master节点上.在异常断电后,kubernetes集群无法正常启动.这里通过查看k ...
- Velero:备份、迁移Kubernetes集群资源和PV
Velero基本介绍 官方文档:https://velero.io/docs/v1.4/ 基本工作原理: 不管需求是实现什么,比如:集群迁移.恢复.备份,其核心都是通过velero client CL ...
- Kubernetes集群部署关键知识总结
Kubernetes集群部署需要安装的组件东西很多,过程复杂,对服务器环境要求很苛刻,最好是能连外网的环境下安装,有些组件还需要连google服务器下载,这一点一般很难满足,因此最好是能提前下载好准备 ...
- 基于kubernetes集群的Vitess最佳实践
概要 本文主要说明基于kubernetes集群部署并使用Vitess; 本文假定用户已经具备了kubernetes集群使用环境,如果不具备请先参阅基于minikube的kubernetes集群搭建, ...
- 基于minikube的kubernetes集群部署及Vitess最佳实践
简介 minikube是一个可以很容易在本地运行Kubernetes集群的工具, minikube在电脑上的虚拟机内运行单节点Kubernetes集群,可以很方便的供Kubernetes日常开发使用: ...
- Kubernetes集群
Kubernetes已经成为当下最火热的一门技术,未来一定也会有更好的发展,围绕着云原生的周边产物也越来越多,使得上云更加便利更加有意义,本文主要讲解一些蔚来汽车从传统应用落地到Kubernetes集 ...
- Kubernetes 集群无损升级实践 转至元数据结尾
一.背景 活跃的社区和广大的用户群,使 Kubernetes 仍然保持3个月一个版本的高频发布节奏.高频的版本发布带来了更多的新功能落地和 bug 及时修复,但是线上环境业务长期运行,任何变更出错都可 ...
- 使用 Kubeadm+Containerd 部署一个 Kubernetes 集群
本文独立博客阅读地址:https://ryan4yin.space/posts/kubernetes-deployemnt-using-kubeadm/ 本文由个人笔记 ryan4yin/knowle ...
- Kubernetes集群使用CentOS 7.6系统时kubelet日志含有“Reason:KubeletNotReady Message:PLEG is not healthy:”信息
问题描述 Kubernetes集群使用CentOS 7.6版本的系统时,kubelet日志中可能存在以下告警信息. Reason:KubeletNotReady Message:PLEG is not ...
随机推荐
- FB力挺的Pytorch深度学习 书本来了
获得 fb首席科学家力挺的 pytorch教程 发布啦, 看截图 ![file](https://img2018.cnblogs.com/blog/1876748/201911/1876748-201 ...
- Install python3
wget https://www.python.org/ftp/python/3.7.4/Python-3.7.4.tgz tar xf Python-3.7.4.tgz cd Python-3.7. ...
- VLAN配置及Trunk接口配置
实验拓扑 1.检验连通性,PC2 ping PC3,PC2 ping PC4 ,都能ping 通 2.创建vlan 3.配置access接口 在S1上配置E0/0/2为vlan10和E0/0/3为vl ...
- mysql--时区表问题(Windows环境下)
自己用Django开发个人博客是,数据库用的是mysql,期间遇到一个时间不一致的问题,具体解决过程: 1.问题原因:Windows没有时区表 2.去mysql官网下载相应版本的时区表:https:/ ...
- tensorflow:模型的保存和训练过程可视化
在使用tf来训练模型的时候,难免会出现中断的情况.这时候自然就希望能够将辛辛苦苦得到的中间参数保留下来,不然下次又要重新开始. 保存模型的方法: #之前是各种构建模型graph的操作(矩阵相乘,sig ...
- Spring Boot整合Elasticsearch启动报错
如果你遇见下面的错误,很可能是你的springboot和es版本关系不对应 ERROR 14600 --- [ main] .d.e.r.s.AbstractElasticsearchReposito ...
- Chrome插件安装的3种方法,解决拖放不能安装的情况,并提供插件下载
本文摘录于Chrome插件网站 方法一:拖放安装 下载插件的crx文件后,打开Chrome的扩展页面(chrome://extensions/或按Chrome菜单图标>更多工具>扩展程序) ...
- 2019-2020-3 20199317《Linux内核原理与分析》第三周作业
第2章 操作系统是如何工作的 1 计算机的三大法宝 存储程序计算机:冯诺依曼结构 函数调用堆栈机制:记录调用的路径和参数的空间 中断机制:由CPU和内核代码共同实现了保存现场和恢复现场, ...
- 【10分钟学Spring】:@Profile、@Conditional实现条件化装配
根据不同的环境来装配不同的bean 企业级开发中,我们一般有多种环境,比如开发环境.测试环境.UAT环境和生产环境.而系统中有些配置是和环境强相关的,比如数据库相关的配置,与其他外部系统的集成等. 如 ...
- golang数据结构之循环链表
循环链表还是挺有难度的: 向链表中插入第一条数据的时候如何进行初始化. 删除循环链表中的数据时要考虑多种情况. 详情在代码中一一说明. 目录结构如下: circleLink.go package li ...