K8S部署---故障处理|江阴雨辰互联

2023年6月30日发(作者：)

K8S部署---故障处理问题3：kubeadm⽅式部署K8S集群时，node加⼊K8S集群时卡顿并失败？[root@k8s-node01 ~]# kubeadm join 192.168.1.201:6443 --token 1qo7ms.7atall1jcecf10qz --discovery-token-ca-cert-hashsha256:d1d102ceb6241a3617777f6156cd4e86dc9f9edd9e1d6d73266d6ca7f6280890[preflight] Running pre-flight checks原因分析：初始化主机点后发现部分组件异常；[root@k8s-master01 ~]# kubectl get pod -n kube-system && kubectl get svcNAME READY STATUS RESTARTS AGEcoredns-54d67798b7-28w5q 0/1 Pending 0 3m39scoredns-54d67798b7-sxqpm 0/1 Pending 0 3m39setcd-k8s-master01 1/1 Running 0 3m53skube-apiserver-k8s-master01 1/1 Running 0 3m53skube-controller-manager-k8s-master01 1/1 Running 0 3m53skube-proxy-rvj6w 0/1 CrashLoopBackOff 5 3m40skube-scheduler-k8s-master01 1/1 Running 0 3m53s解决⽅法：修改，重新进⾏初始化主节点。kubeadm reset -f;ipvsadm --clear;rm -rf ./.kubekubeadm init --config= --upload-certs |tee

问题2：kubeadm⽅式部署K8S集群时，node加⼊K8S集群失败？[root@k8s-node01 ~]# kubeadm join 192.168.1.201:6443 --token 6xe31rdb7jbo8 --discovery-token-ca-cert-hashsha256:d1d102ceb6241a3617777f6156cd4e86dc9f9edd9e1d6d73266d6ca7f6280890[preflight] Running pre-flight checks[preflight] Reading configuration from [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: UnauthorizedTo see the stack trace of this error execute with --v=5 or higher原因分析：token过期了；解决⽅法：重新⽣成不过期的token即可；[root@k8s-master01 ~]# kubeadm token create --ttl 0 --print-join-commandW0819 12:00:27.541838 7855 :202] WARNING: kubeadm cannot validate component configs for API groups[ ]kubeadm join 192.168.1.201:6443 --token mpe9qa8nxu --discovery-token-ca-cert-hashsha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097

问题3：kubeadm⽅式部署K8S集群时，部署flannel组件失败？kube-flannel-ds-amd64-8cqqz 0/1 CrashLoopBackOff 3 84s 192.168.66.10 k8s-master01 原因分析：查看⽇志，发现注册⽹络失败，原因在于主节点初始化时yaml⽂件存在问题。kubectl logs kube-flannel-ds-amd64-8cqqz -n kubesystemI0602 01:53:54.021093 1 :514] Determining IP address of default interfaceI0602 01:53:54.022514 1 :527] Using interface with name ens33 and address 192.168.66.10I0602 01:53:54.022619 1 :544] Defaulting external address to interface address (192.168.66.10)I0602 01:53:54.030311 1 :126] Waiting 10m0s for node controller to syncI0602 01:53:54.030555 1 :309] Starting kube subnet managerI0602 01:53:55.118656 1 :133] Node controller sync successfulI0602 01:53:55.118754 1 :244] Created subnet manager: Kubernetes Subnet Manager - k8s-master01I0602 01:53:55.118765 1 :247] Installing signal handlersI0602 01:53:55.119057 1 :386] Found network config - Backend type: vxlanI0602 01:53:55.119146 1 :120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=falseE0602 01:53:55.119470 1 :289] Error registering network: failed to acquire lease: node "k8s-master01" pod cidr not assignedI0602 01:53:55.119506 1 :366] 解决⽅法：修改后重新初始化主节点即可。

问题4：K8S集群初始化主节点失败？ [init] Using Kubernetes version: v1.15.1[preflight] Running pre-flight checks[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.6. Latest validated version: 18.09error execution phase preflight: [preflight] Some fatal errors occurred:[ERROR Port-6443]: Port 6443 is in use[ERROR Port-10251]: Port 10251 is in use[ERROR Port-10252]: Port 10252 is in use[ERROR ]: /etc/kubernetes/manifests/ already exists[ERROR ]: /etc/kubernetes/manifests/ready exists[ERROR ]: /etc/kubernetes/manifests/ already exists[ERROR ]: /etc/kubernetes/manifests/ already exists[ERROR Port-10250]: Port 10250 is in use[ERROR Port-2379]: Port 2379 is in use[ERROR Port-2380]: Port 2380 is in use[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`原因分析：K8S集群已进⾏过初始化主节点。解决⽅法：重置K8S后重新初始化。kubeadm resetkubeadm init --config= --upload-certs |tee

问题5：重置K8S成功后，是否需要删除相关⽂件？[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.[reset] Are you sure you want to proceed? [y/N]: y[preflight] Running pre-flight checksW0602 10:20:53.656954 76680 :79] [reset] No kubeadm config, using etcd pod spec to get data directory[reset] No etcd config found. Assuming external etcd[reset] Please, manually reset etcd to prevent further issues[reset] Stopping the kubelet service[reset] Unmounting mounted directories in "/var/lib/kubelet"[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki][reset] Deleting files: [/etc/kubernetes/ /etc/kubernetes/ /etc/kubernetes//etc/kubernetes/ /etc/kubernetes/][reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]The reset process does not reset or clean up iptables rules or IPVS you wish to reset iptables, you must do so example:iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -XIf your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)to reset your system's IPVS reset process does not clean your kubeconfig files and you must remove them , check the contents of the $HOME/.kube/config file.原因分析：⽆。解决⽅法：可根据提⽰删除相关⽂件，避免主节点初始化后引起其他问题。

问题6：主节点初始化成功后，查看节点信息失败？ Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" whiletrying to verify candidate authority certificate "kubernetes")原因分析：kubeadm重置时，未删除缓存⽂件等。解决⽅法：删除缓存⽂件后再进⾏初始化主节点。rm -rf $HOME/.kube/kubeadm reset>

问题7：⼯作节点加⼊master节点时卡住不动，仅提⽰docker版本警告？[preflight] Running pre-flight checks[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.6. Latest validated version: 18.09error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s原因分析：master节点的token过期了，docker版本过⾼。解决⽅法：使⽤ 18.06 版本可以消除该警告；在master节点重新⽣成token；并在⼯作节点使⽤新tonken执⾏命令。

问题8、master节点加⼊K8S集群失败？[root@k8s-master02 ~]# kubeadm join 192.168.1.201:6443 --token mpe9qa8nxu --discovery-token-ca-cert-hashsha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097 > --control-plane --certificate-key b464a8d23d3313c4c0bb5b65648b039cb9b1177dddefbf46e2e296899d0e4516[preflight] Running pre-flight checks[preflight] Reading configuration from [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'error execution phase preflight:

One or more conditions for hosting a new control plane instance is not to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint addressPlease ensure that:* The cluster has a stable controlPlaneEndpoint address.* The certificates that must be shared among control plane instances are see the stack trace of this error execute with --v=5 or higher原因分析：证书未共享。解决⽅法：共享证书即可##########其他master节点执⾏如下内容#################################mkdir -p /etc/kubernetes/pki/etcd/##########master01节点执⾏如下内容#################################cd /etc/kubernetes/pki/scp ca.* front-proxy-ca.* sa.* 192.168.1.202:/etc/kubernetes/pki/scp ca.* front-proxy-ca.* sa.* 192.168.1.203:/etc/kubernetes/pki/##########其他master节点执⾏如下内容#################################kubeadm join 192.168.1.201:6443 --token mpe9qa8nxu --discovery-token-ca-cert-hashsha256:bd78dfd370e47dfca742b5f6934c21014792168fa4dc19c9fa63bfdd87270097 --control-plane --certificate-keyb464a8d23d3313c4c0bb5b65648b039cb9b1177dddefbf46e2e296899d0e4516 问题9、部署prometheus失败？unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "Alertmanager" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/ v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "Prometheus" in version "/v1"unable to recognize "": no matches for kind "PrometheusRule" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version"/v 1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"unable to recognize "": no matches for kind "ServiceMonitor" in version "/v1"原因分析：不明。解决⽅法：重新执⾏发布命令。

问题7：⽆法下载K8S⾼可⽤安装包，但能直接通过浏览器下载git包？Cloning into 'k8s-ha-install'...error: RPC failed; result=35, HTTP code = 0fatal: The remote end hung up unexpectedly原因分析： git buffer

太⼩了。解决⽅法：git config --global ffer 100M #git buffer增⼤。

问题8：⽆法下载K8S⾼可⽤安装包，但能直接通过浏览器下载git包？原因分析：不明；解决⽅法：⼿动下载并上传git包；

问题9：K8S查看分⽀失败？[root@k8s-master01 k8s-ha-install-master]# git branch -afatal: Not a git repository (or any of the parent directories): .git原因分析：缺少.git本地仓库。解决⽅法：初始化git即可。[root@k8s-master01 k8s-ha-install-master]# git initInitialized empty Git repository in /root/install-k8s-v1.17/k8s-ha-install-master/.git/[root@k8s-master01 k8s-ha-install-master]# git branch -a

问题10：K8S切换分⽀失败？[root@k8s-master01 k8s-ha-install-master]# git checkout : pathspec 'manual-installation-v1.20.x' did not match any file(s) known to git.原因分析：没有发现分⽀。解决⽅法：必须通过git下载，不能通过浏览器下载zip格式的⽂件；或将git下载的⽂件打包后再次解压也不可使⽤。[root@k8s-master01 k8s-ha-install]# git checkout h manual-installation-v1.20.x set up to track remote branch manual-installation-v1.20.x from ed to a new branch 'manual-installation-v1.20.x'

问题11：apiserver聚合证书⽣成失败？原因分析：命令缺少hosts参数，不适合⽤于⽹站；但不影响apiserver组件与其他组件通信。解决⽅法：⽆需关注。问题12：⼆进制安装K8S组件后，出现警告？[root@k8s-master01 ~]# kubectl versionClient Version: {Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38",GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}The connection to the server localhost:8080 was refused - did you specify the right host or port?原因分析：kubectl组件安装成功，但没有成功连接kubernetes apiserver；解决⽅法：安装证书并配置组件即可。

问题13：⼆进制安装K8S进⾏证书配置后，启动ETCD失败？Jul 21 08:52:55 k8s-master02 etcd[1424]: rejected connection from "192.168.0.107:38222" (error "remote error: tls: bad certificate",ServerName "")Jul 21 08:52:55 k8s-master02 etcd[1424]: rejected connection from "192.168.0.107:38224" (error "remote error: tls: bad certificate",ServerName "")原因分析：认证失败；可能原因：创建证书的csr⽂件问题导致证书错误，IP未加⼊hosts中。

cfssl gencert -initca |cfssljson -bare /etc/etcd/ssl/etcd-ca # #根据⽣成ETCD CA证书及key；

cfssl gencert -ca=/etc/etcd/ssl/ -ca-key=/etc/etcd/ssl/ -config= -hostname=127.0.0.1,k8s-master01,k8s-master02,k8s-master03,192.168.0.201,192.168.0.202,192.169.0.203 -profile=kubernetes | cfssljson -bare /etc/etcd/ssl/etcd #根据CA证书颁发ETCD客户端证书；解决⽅法：暂⽆。

问题14：haproxy启动失败？[root@k8s-master01 pki]# systemctl enable --now haproxy;systemctl status haproxyCreated symlink from /etc/systemd/system//e to /usr/lib/systemd/system/e.● e - HAProxy Load BalancerLoaded: loaded (/usr/lib/systemd/system/e; enabled; vendor preset: disabled)Active: failed (Result: exit-code) since Wed 2021-07-21 10:25:35 CST; 3ms agoProcess: 2255 ExecStart=/usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/ -p /run/ $OPTIONS (code=exited,status=1/FAILURE)Main PID: 2255 (code=exited, status=1/FAILURE)Jul 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [ALERT] 201/102535 (2256) : parsing [/etc/haproxy/:59] : 'server192.168.0.107:6443' : in 'check'Jul 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [ALERT] 201/102535 (2256) : parsing [/etc/haproxy/:60] : 'server192.168.0.108:6443' : in 'check'Jul 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [ALERT] 201/102535 (2256) : parsing [/etc/haproxy/:61] : 'server192.168.0.109:6443' : in 'check'Jul 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [ALERT] 201/102535 (2256) : Error(s) found in configuration file :/etc/haproxy/ 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [WARNING] 201/102535 (2256) : config : frontend 'GLOBAL' has no 'bind'directive. Please declare it as a backend ... 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: [ALERT] 201/102535 (2256) : Fatal errors found in 21 10:25:35 k8s-master01 systemd[1]: e: main process exited, code=exited, status=1/FAILUREJul 21 10:25:35 k8s-master01 haproxy-systemd-wrapper[2255]: haproxy-systemd-wrapper: exit, haproxy RC=1Jul 21 10:25:35 k8s-master01 systemd[1]: Unit e entered failed 21 10:25:35 k8s-master01 systemd[1]: e : Some lines were ellipsized, use -l to show in full.原因分析：配置错误。解决⽅法：添加option tcp-check即可。

问题15：apiserver启动失败？[root@k8s-master01 ~]# systemctl daemon-reload && systemctl enable --now kube-apiserverFailed to execute operation: No such file or directory原因分析：配置问题，etcd不能正常⼯作原因。解决⽅法：修改配置即可。

问题16：kubelet服务启动失败？Aug 13 12:15:11 k8s-master02 systemd[1]: Unit e entered failed 13 12:15:11 k8s-master02 systemd[1]: e failed.原因分析：selinux未关闭，且未进⾏初始化；解决⽅法：关闭selinux并进⾏初始化。

问题17：k8s主节点初始化失败？[root@k8s-master01 ~]# kubeadm init --control-plane-endpoint "k8s-master01-lb:16443" --upload-certsI0813 13:49:07.164695 10196 :254] remote version is much newer: v1.22.0; falling back to: stable-1.20[init] Using Kubernetes version: v1.20.10[preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred:[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may leadto a malfunctional cluster. Kubelet version: "1.22.0" Control plane version: "1.20.10"[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher原因分析：kubeadm版本不⼀致；解决⽅法：重新安装kubeadm即可。yum -y remove kubectl kubelet && yum -y install kubectl-1.20.5-0 kubelet-1.20.5-0 kubeadm-1.20.5-0

问题18：kubeadm安装k8s，初始化后⽆法查看node状态？[root@k8s-master01 ~]# kubectl get nodeThe connection to the server localhost:8080 was refused - did you specify the right host or port?原因分析：kubectl⽆法与K8S集群通信；解决⽅法：.kube⽬录是⽤于存放kubectl与K8S集群交互的缓存⽂件及相关配置⽂件；mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/ $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config

问题19：kubeadm初始化master01成功后，没有出现加⼊其他master节点的提⽰命令，重新⽣成token及CA后⽆法加⼊master？[root@k8s-master01 ~]# kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-master01 NotReady control-plane,master 132m v1.20.0[root@k8s-master01 ~]#

[root@k8s-master01 ~]#

[root@k8s-master01 ~]# kubeadm token create --print-join-command

kubeadm join 192.168.1.201:6443 --token 8mlz6uwma38fr3 --discovery-token-ca-cert-hashsha256:00c16efd17ef0a08ee46e8462f1fa664aa084de32e09c3a21b36fe389b39cf37

[root@k8s-master01 ~]#

[root@k8s-master01 ~]# kubeadm init phase upload-certs --upload-certsI0816 13:33:00.948386 35123 :251] remote version is much newer: v1.22.0; falling back to: stable-1.20[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace[upload-certs] Using certificate key:bc08d35720ca8a20d189454d9550364410a51426e958929c0d50b5dcb8c5930e

[root@k8s-master02 ~]# kubeadm join 192.168.1.201:6443 --token 8mlz6uwma38fr3 > --discovery-token-ca-cert-hash sha256:00c16efd17ef0a08ee46e8462f1fa664aa084de32e09c3a21b36fe389b39cf37 > --control-plane --certificate-key bc08d35720ca8a20d189454d9550364410a51426e958929c0d50b5dcb8c5930e[preflight] Running pre-flight checks[preflight] Reading configuration from [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'error execution phase preflight:

问题20：calico安装异常？[root@k8s-master01 calico]# kubectl describe pod calico-node-dtqc8 -n kube-systemName: calico-node-dtqc8Namespace: Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 10m default-scheduler Successfully assigned kube-system/calico-node-dtqc8 to k8s-master01Normal Pulling 10m kubelet Pulling image "/dotbalo/cni:v3.15.3"Normal Pulled 10m kubelet Successfully pulled image "/dotbalo/cni:v3.15.3" in 44.529354646sNormal Created 10m kubelet Created container install-cniNormal Started 10m kubelet Started container install-cniNormal Pulling 10m kubelet Pulling image "/dotbalo/pod2daemon-flexvol:v3.15.3"Normal Pulled 9m45s kubelet Successfully pulled image "/dotbalo/pod2daemon-flexvol:v3.15.3" in27.031642012sNormal Created 9m45s kubelet Created container flexvol-driverNormal Started 9m44s kubelet Started container flexvol-driverNormal Pulling 9m43s kubelet Pulling image "/dotbalo/node:v3.15.3"Normal Pulled 8m42s kubelet Successfully pulled image "/dotbalo/node:v3.15.3" in 1m1.78614215sNormal Created 8m41s kubelet Created container calico-nodeNormal Started 8m41s kubelet Started container calico-nodeWarning Unhealthy 8m39s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable toconnect to BIRDv4 socket: dial unix /var/run/calico/: connect: connection refused原因分析：不明。解决⽅法：暂⽆影响，可不理会。

问题21：kubectl⼯具不能使⽤？[root@k8s-master01 ~]# kubectl get nodesThe connection to the server 192.168.1.201:6443 was refused - did you specify the right host or port?

原因分析：未设置环境变量；解决⽅法：cat <> /root/.bashrcexport KUBECONFIG=/etc/kubernetes/fsource /root/.bashrc

发布者：admin，转转请注明出处：http://www.yc00.com/news/1688057779a72560.html