问题1:

描述:pod新建好后,无法ping通域名(无论是外网域名还是内网域名),但是可以ping通IP(包含外网IP和内网IP),不包括kube-dns的IP,和pod同一网段IP可以ping通
# cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5 # ping baidu.com
ping: unknown host baidu.com
# ping 39.156.66.10 # 百度IP
PING 39.156.66.10 (39.156.66.10) 56(84) bytes of data.
64 bytes from 39.156.66.10: icmp_seq=1 ttl=47 time=34.5 ms
# nslookup kubernetes.default.svc.cluster.local
;; connection timed out; no servers could be reached

# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
From 10.96.0.10 icmp_seq=1 Destination Port Unreachable

# ping 10.96.0.1
PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data.
From 10.96.0.1 icmp_seq=1 Destination Port Unreachable

# 如果resolv.conf中加入一个nameserver 114.114.114.114,外网域名就可以ping通了,就是等待时间有点长,内网域名还是ping不通

# echo nameserver 114.114.114.114 >> /etc/resolv.conf
# ping baidu.com
PING baidu.com (110.242.68.66) 56(84) bytes of data.
64 bytes from 110.242.68.66: icmp_seq=1 ttl=49 time=33.3 ms

[root@master-test ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 rr
-> 192.168.12.88:6443 Masq 1 5 0
TCP 10.96.0.10:53 rr
-> 10.244.61.72:53 Masq 1 0 0
-> 10.244.182.2:53 Masq 1 0 0
TCP 10.96.0.10:9153 rr
-> 10.244.61.72:9153 Masq 1 0 0
-> 10.244.182.2:9153 Masq 1 0 0
UDP 10.96.0.10:53 rr
-> 10.244.61.72:53 Masq 1 0 0
-> 10.244.182.2:53 Masq 1 0 0
[root@master-test ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
app01 Ready worker 6d2h v1.28.5
master-test Ready control-plane 6d3h v1.28.5 [root@master-test ~]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-7c968b5878-bjvh4 1/1 Running 0 5d1h 10.244.61.65 app01 <none> <none>
calico-node-b77st 1/1 Running 0 5d1h 192.168.12.88 master-test <none> <none>
calico-node-p5md7 1/1 Running 0 5d1h 192.168.12.93 app01 <none> <none>
coredns-6554b8b87f-jtrt6 1/1 Running 0 4d6h 10.244.61.72 app01 <none> <none>
coredns-6554b8b87f-tct87 1/1 Running 0 4d6h 10.244.182.2 master-test <none> <none>
dnsutils 1/1 Running 0 44m 10.244.61.81 app01 <none> <none>
etcd-master-test 1/1 Running 0 6d3h 192.168.12.88 master-test <none> <none>
kube-apiserver-master-test 1/1 Running 0 6d3h 192.168.12.88 master-test <none> <none>
kube-controller-manager-master-test 1/1 Running 0 6d3h 192.168.12.88 master-test <none> <none>
kube-proxy-l4bct 1/1 Running 0 99m 192.168.12.93 app01 <none> <none>
kube-proxy-tvzn6 1/1 Running 0 100m 192.168.12.88 master-test <none> <none>
kube-scheduler-master-test 1/1 Running 0 6d3h 192.168.12.88 master-test <none> <none>

[root@master-test ~]# kubectl get svc -n kube-system
NAME    TYPE    CLUSTER-IP EXTERNAL-IP PORT(S)          AGE
kube-dns ClusterIP 10.96.0.10 <none>    53/UDP,53/TCP,9153/TCP 6d3h

查找问题原因步骤方法:

1.官网方法先来一遍进行了解:https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/dns-debugging-resolution/

2.添加日志的方法

# 1.添加dns日志记录
[root@master-test ~]# kubectl edit configmap coredns -n kube-system
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2024-01-18T06:12:22Z"
name: coredns
namespace: kube-system
resourceVersion: "137225"
uid: b236bfc0-19e7-46fc-a8ef-bc8c669a1836 # 2.加好后等一会,在pod里ping 域名和IP看看是否有解析日志,我采用删除coredns重新建立比较快,删一个等好了再删另外一个,正式环境不要这样
[root@master-test ~]# kubectl logs -n kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration SHA512 = c0af6acba93e75312d34dc3f6c44bf8573acff497d229202a4a49405ad5d8266c556ca6f83ba0c9e74088593095f714ba5b916d197aa693d6120af8451160b80
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[INFO] 127.0.0.1:39822 - 38517 "HINFO IN 5614870804872510990.5324688006900630936. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.009762106s
.:53
[INFO] plugin/reload: Running configuration SHA512 = c0af6acba93e75312d34dc3f6c44bf8573acff497d229202a4a49405ad5d8266c556ca6f83ba0c9e74088593095f714ba5b916d197aa693d6120af8451160b80
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[INFO] 127.0.0.1:51931 - 20788 "HINFO IN 2522116324497527390.2945203159972947197. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.011257505s # 3.在pod里测试ping 域名,查看日志反馈,发现没什么反应 # 4.单独将coredns添加到resolv.conf中发现可以解析外网域名
# cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
nameserver 10.244.182.2 # ping baidu.com
PING baidu.com (39.156.66.10) 56(84) bytes of data.
64 bytes from 39.156.66.10: icmp_seq=1 ttl=47 time=34.5 ms
--- baidu.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 11192ms
rtt min/avg/max/mdev = 34.354/34.461/34.551/0.229 ms
# nslookup kubernetes.default.svc.cluster.local
;; Got recursion not available from 10.244.182.2, trying next server
;; connection timed out; no servers could be reache # core-dns日志
[root@master-test ~]# kubectl logs -n kube-system -l k8s-app=kube-dns
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3

[INFO] 10.244.61.81:44784 - 24827 "A IN baidu.com.kube-system.svc.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.00019707s
[INFO] 10.244.61.81:34472 - 23535 "A IN baidu.com.svc.cluster.local. udp 45 false 512" NXDOMAIN qr,aa,rd 138 0.000174762s
[INFO] 10.244.61.81:58581 - 59635 "A IN baidu.com.cluster.local. udp 41 false 512" NXDOMAIN qr,aa,rd 134 0.000152417s
[INFO] 10.244.61.81:35409 - 57286 "A IN baidu.com. udp 27 false 512" NOERROR qr,rd,ra 77 0.000441379s
[INFO] 10.244.61.81:39140 - 58999 "PTR IN 10.66.156.39.in-addr.arpa. udp 43 false 512" NXDOMAIN qr,rd,ra 43 0.037215205s

# ping kubernetes.default域名生成

[INFO] 10.244.61.81:59868 - 5855 "A IN kubernetes.default.svc.cluster.local.kube-system.svc.cluster.local. udp 84 false 512" NXDOMAIN qr,aa,rd 177 0.000168988s

3.新增ipvs,本来就开启了ipvs,但是没找到原因,还是将脚本运行下

#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack"
for kernel_module in ${ipvs_modules}; do
/sbin/modinfo -F filename ${kernel_module} > /dev/null 2>&1
if [ 0 -eq 0 ]; then
/sbin/modprobe ${kernel_module}
fi
done [root@master ~]#chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules
[root@master-test ~]# lsmod | grep ip_vs
ip_vs_ftp 13079 0
ip_vs_sed 12519 0
ip_vs_nq 12516 0
ip_vs_sh 12688 0
ip_vs_dh 12688 0
ip_vs_lblcr 12922 0
ip_vs_lblc 12819 0
ip_vs_wrr 12697 0
ip_vs_wlc 12519 0
ip_vs_lc 12516 0
ip_vs_rr 12600 4
ip_vs 145458 46 ip_vs_dh,ip_vs_lc,ip_vs_nq,ip_vs_rr,ip_vs_sh,ip_vs_ftp,ip_vs_sed,ip_vs_wlc,ip_vs_wrr,ip_vs_lblcr,ip_vs_lblc
nf_nat 26583 6 ip_vs_ftp,nf_nat_ipv4,nf_nat_ipv6,xt_nat,nf_nat_masquerade_ipv4,nf_nat_masquerade_ipv6
nf_conntrack 143360 10 ip_vs,nf_nat,nf_nat_ipv4,nf_nat_ipv6,xt_conntrack,nf_nat_masquerade_ipv4,nf_nat_masquerade_ipv6,nf_conntrack_netlink,nf_conntrack_ipv4,nf_conntrack_ipv6
libcrc32c 12644 3 ip_vs,nf_nat,nf_conntrack

4.查看kube-proxy日志,网上找的方法都是要升级内核说是内核过低的原因,问题是我在虚拟机上就没问题,内核一样

[root@master-test ~]# kubectl logs kube-proxy-tvzn6 -n kube-system
I0124 07:36:06.598828 1 node.go:141] Successfully retrieved node IP: 192.168.12.88
I0124 07:36:06.600381 1 conntrack.go:52] "Setting nf_conntrack_max" nfConntrackMax=131072
I0124 07:36:06.627913 1 server.go:632] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
I0124 07:36:06.639591 1 server_others.go:218] "Using ipvs Proxier"
I0124 07:36:06.639637 1 server_others.go:421] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR for family" ipFamily="IPv6"
I0124 07:36:06.639645 1 server_others.go:438] "Defaulting to no-op detect-local"
E0124 07:36:06.639776 1 proxier.go:354] "Can't set sysctl, kernel version doesn't satisfy minimum version requirements" sysctl="net/ipv4/vs/conn_reuse_mode" minimumKernelVersion="4.1"
I0124 07:36:06.639838 1 proxier.go:408] "IPVS scheduler not specified, use rr by default"
E0124 07:36:06.639956 1 proxier.go:354] "Can't set sysctl, kernel version doesn't satisfy minimum version requirements" sysctl="net/ipv4/vs/conn_reuse_mode" minimumKernelVersion="4.1"
I0124 07:36:06.640009 1 proxier.go:408] "IPVS scheduler not specified, use rr by default"
I0124 07:36:06.640028 1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-LOAD-BALANCER-SOURCE-CIDR" truncatedName="KUBE-6-LOAD-BALANCER-SOURCE-CID"
I0124 07:36:06.640037 1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-NODE-PORT-LOCAL-SCTP-HASH" truncatedName="KUBE-6-NODE-PORT-LOCAL-SCTP-HAS"
I0124 07:36:06.640088 1 server.go:846] "Version info" version="v1.28.5"
I0124 07:36:06.640095 1 server.go:848] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0124 07:36:06.640932 1 config.go:188] "Starting service config controller"
I0124 07:36:06.640957 1 shared_informer.go:311] Waiting for caches to sync for service config
I0124 07:36:06.640988 1 config.go:97] "Starting endpoint slice config controller"
I0124 07:36:06.640996 1 shared_informer.go:311] Waiting for caches to sync for endpoint slice config
I0124 07:36:06.642282 1 config.go:315] "Starting node config controller"
I0124 07:36:06.642296 1 shared_informer.go:311] Waiting for caches to sync for node config
I0124 07:36:06.741777 1 shared_informer.go:318] Caches are synced for endpoint slice config
I0124 07:36:06.741840 1 shared_informer.go:318] Caches are synced for service config
I0124 07:36:06.742422 1 shared_informer.go:318] Caches are synced for node config

5.尝试在新生成的pod默认resolv.conf中自动加入DNS,修改配置文件

vim /var/lib/kubelet/config.yaml
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
- 100.100.2.136
- 100.100.2.138
clusterDomain: cluster.local # 新建的pod果然有阿里云的2个DNS可以ping通外网域名,但是集群内域名还是不通

有谁有解决方法吗?

k8s之dns问题的更多相关文章

  1. dial tcp 10.96.0.1:443: getsockopt: no route to host --- kubernetes(k8s)DNS 服务反复重启

    kubernetes(k8s)DNS 服务反复重启解决: k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https:// ...

  2. (转)dial tcp 10.96.0.1:443: getsockopt: no route to host --- kubernetes(k8s)DNS 服务反复重启

    转:https://blog.csdn.net/shida_csdn/article/details/80028905 kubernetes(k8s)DNS 服务反复重启解决: k8s.io/dns/ ...

  3. K8s基于DNS的服务发现(转)

    原文地址:https://www.oschina.net/question/2657833_2201246 1.Kubernetes中如何发现服务 ◆   发现Pod提供的服务 首先使用nginx-d ...

  4. k8s部署dns

    硬件环境: 两台虚拟机, 10.10.20.203 部署docker.etcd.flannel.kube-apiserver.kube-controller-manager.kube-schedule ...

  5. K8S的DNS服务

    k8s集群部署完后第一件事就是要配置DNS服务,目前可选的方案有skydns, kube-dns, coredns kube-dns是k8s中的一个内置插件,目前作为一个独立的开源项目维护,见http ...

  6. k8s之DNS服务器搭建

    一.导读 在使用k8s部署springboot+redis简单应用这篇文章中,spring boot连接redis是直接使用的IP连接,那么可不可以直接使用服务名称进行连接呢?答案是可以的,这就是k8 ...

  7. k8s系列---dns部署

    1:首先创建kube-dns和dnsmasq这两个yaml,然后生成相应的pod.svc等. 2:然后在去创建其他的验证pod和svc 3:验证nslookup解析的是其他pod的svc的name,而 ...

  8. Kubernetes学习之路(二十)之K8S组件运行原理详解总结

    目录 一.看图说K8S 二.K8S的概念和术语 三.K8S集群组件 1.Master组件 2.Node组件 3.核心附件 四.K8S的网络模型 五.Kubernetes的核心对象详解 1.Pod资源对 ...

  9. kubenetes dns

    E0228 07:32:28.912833       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.En ...

  10. [转帖]k8s 中的服务如何沟通

    k8s 中的服务如何沟通 https://www.jianshu.com/p/9fae09876eb7 本文将介绍 k8s 中的服务如何相互访问,例如后端服务访问数据库,不同类型的服务间的相互访问.并 ...

随机推荐

  1. 记录--分享8个非常实用的Vue自定义指令

    这里给大家分享我在网上总结出来的一些知识,希望对大家有所帮助 在 Vue,除了核心功能默认内置的指令 ( v-model 和 v-show ),Vue 也允许注册自定义指令.它的作用价值在于当开发人员 ...

  2. Oracle 索引原理

    B-Tree索引 一个B树索引只有一个根节点,它实际就是位于树的最顶端的分支节点. 可以用下图一来描述B树索引的结构.其中,B表示分支节点,而L表示叶子节点. 对于分支节点块(包括根节点块)来说,其所 ...

  3. 立创EDA的使用

    立创EDA的使用 1.实验原理 最近在使用立创EDA来做电路作业,这里记录一下立创EDA的基本操作,以后小型的电路设计可以在其主页完成.立创EDA是一个可以线上完成电路设计仿真以及布线的免费设计工具, ...

  4. AXI4的主从机的收发机制

    AXI4的收发机制 1.AXI4的译码器 对于多个主机和从机的AXI4总线的互联架构,译码器都是重要的.主机的端口一般有三个译码器,分别是读地址通道.写地址通道和写数据通道的译码器.主要作用是将通道的 ...

  5. 看看谷歌如何在目标检测任务使用预训练权值 | CVPR 2022

    论文提出能够适配硬件加速的动态网络DS-Net,通过提出的double-headed动态门控来实现动态路由.基于论文提出的高性能网络设计和IEB.SGS训练策略,仅用1/2-1/4的计算量就能达到静态 ...

  6. KingbaseES 原生XML系列三--XML数据查询函数

    KingbaseES 原生XML系列三--XML数据查询函数(EXTRACT,EXTRACTVALUE,EXISTSNODE,XPATH,XPATH_EXISTS,XMLEXISTS) XML的简单使 ...

  7. KingbaseES Collate排序规则对结果集的影响

    背景 前端在客户现场遇到一个问题,模糊查询报错:error:invalid multibyte charactor for locale pg the server LC_TYPE locale is ...

  8. sklearn数据集使用(鸢尾花)

    1 2 from sklearn.datasets import load_iris 3 4 """ 5 sklearn数据集使用 6 :return: 7 " ...

  9. OpenHarmony设备环境查询:Environment

      开发者如果需要应用程序运行的设备的环境参数,以此来作出不同的场景判断,比如多语言,暗黑模式等,需要用到Environment设备环境查询. Environment是ArkUI框架在应用程序启动时创 ...

  10. 【Learning eBPF-3】一个 eBPF 程序的深入剖析

    从这一章开始,我们先放下 BCC 框架,来看仅通过 C 语言如何实现一个 eBPF.如此一来,你会更加理解 BCC 所做的底层工作. 在这一章中,我们会讨论一个 eBPF 程序被执行的完整流程,如下图 ...