f10@t's blog

Kubernetes 集群环境搭建使用及问题记录(一)

字数统计: 3.6k阅读时长: 18 min
2021/08/03

本篇将搭建一个单master节点的集群(后续补多master)并测试其可用性。整体上过程不是很复杂,就是中间几个报错耗点时间,所有我搭建中出现的问题也有记录

本文操作系统Centos 7;kubernetes版本:v1.21.3;docker版本:20.10.7。

安装一个单Master节点的Kubernetes集群

大体流程

  • 第一步:安装三台虚拟机(本文选择Centos 7最小安装)
  • 对每一个虚拟机进行配置操作(关闭SELinux、关闭防火墙、关闭swap、配置hosts信息、配置静态IP、配置软件源)
  • 在三个节点上装docker、kubelet、kubeadm、kubectl并进行配置(配置docker、kubernetes为阿里源)
  • 在master节点上使用kubeadm方式进行初始化并执行相关操作
  • 各个node节点加入集群
  • 配置网络组件(本文使用flannel
  • 通过建立一个nginx的pod来测试集群

准备工作

基本工作

首先安装三台Linux机器,这个没啥要求,我这里是Centos7下面对每一台机器进行以下处理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# 1. 更换软件源
## 备份
cp -a /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
## 更换为华为源
sed -i "s/#baseurl/baseurl/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s/mirrorlist=http/#mirrorlist=http/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s@http://mirror.centos.org@https://repo.huaweicloud.com@g" /etc/yum.repos.d/CentOS-Base.repo
## 清除缓存并更新
yum clean all
yum makecache

# 2. 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 3. 永久关闭selinux
sed -i 's/enforcing/disabled' /etc/selinux/config
setenforce 0

# 4. 关闭swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 5. 设置主机名
hostnamectl set-hostname <hostname>

# 6. 为每一台机器添加hosts,比如我的环境:/etc/hosts
10.10.10.101 k8s-master01
10.10.10.102 k8s-node01
10.10.10.103 k8s-node02

# 7. 将IPv4流量传递到iptables
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# 生效
sysctl --system

# 8. 同步时间
yum install ntpdate -y
ntpdate time.windows.com

# 9. 安装docker
## 安装必要系统工具
yum install -y yum-utils device-mapper-persistent-data lvm2
## 添加软件源信息(阿里源)
wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors。aliyun.com/docker-ce/linux/centos/docker-ce.repo
## 安装Docker并启动、设置自启
yum install docker-ce
systemctl start docker && systemctl enable docker

# 10. 安装kubernetes
## 添加软件源信息
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
## 安装kubelet、kubeadm、kubectl,并设置自启(暂时不需要启动)
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet

配置工作

主要是配置Docker镜像加速器,这个建议自己去阿里云注册账号,然后去控制台里看自己的镜像加速器地址。注意,也需要吧docker的cgroupdriver修改为systemd形式,与k8s保持一致。(不然后续会报错)

1
2
3
4
5
6
cat > /etc/docker/daemon.json << EOF
{
"exec-opts": ["native.cgroupdriver=systemd"]
"registry-mirrors": ["https://xxx.mirror.aliyuncs.com"]
}
EOF

初始化master节点

我的初始化命令如下,注意,这是在master机器上,node节点时无需init的,后面直接join就行了。

1
kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24

其中每一个参数含义如下:

  • apiserver-advertise-address:本节点的IP
  • image-repository:镜像地址,就用阿里云的就行了
  • kubernetes-version:Kubernetes的版本号,用kubelet --version就能看到
  • service-cidr:服务的CIDR网段
  • pod-network-cidr:pod节点之间互通的的CIDR网段

终端输入记录如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

# 重新拉取失败镜像
[root@k8s-master01 ~]# docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
1.8.0: Pulling from google_containers/coredns
c6568d217a00: Pull complete
5984b6d55edf: Pull complete
Digest: sha256:cc8fb77bc2a0541949d1d9320a641b82fd392b0d3d8145469ca4709ae769980e
Status: Downloaded newer image for registry.aliyuncs.com/google_containers/coredns:1.8.0
registry.aliyuncs.com/google_containers/coredns:1.8.0

[root@k8s-master01 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.3 3d174f00aa39 2 weeks ago 126MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.3 6be0dc1302e3 2 weeks ago 50.6MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.3 bc2bb319a703 2 weeks ago 120MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.3 adb2816ea823 2 weeks ago 103MB
registry.aliyuncs.com/google_containers/pause 3.4.1 0f8457a4c2ec 6 months ago 683kB
registry.aliyuncs.com/google_containers/coredns 1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/etcd 3.4.13-0 0369cf4303ff 11 months ago 253MB

# 重新tag
[root@k8s-master01 ~]# docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
[root@k8s-master01 ~]# docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0
Untagged: registry.aliyuncs.com/google_containers/coredns:1.8.0

# 重新初始化
[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.10.11.1 10.10.10.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 28.122070 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 42umpk.lqtmjnryhec3oj4f
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f \
--discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7

最后一行可以看到给出了加入该集群的token,我们直接复制这个去子节点执行就行了。

子节点加入

直接执行上面那个命令,终端输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@k8s-node01 ~]# kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f --discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

两个节点都加入后的结果:

image-20210804095015676

安装网络插件(Flannel)

你可以使用kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml来执行,但是如果初始化的时候你的--pod-network-cidr参数不是flannel默认的10.244.0.0/16的话,我建议你下载下来手动修改一下网络参数,然后apply -f kube-flannel.yml就可以了,结果如下:

image-20210804095357877
image-20210804095412669

创建pod来测试集群

我们先创建一个nginx的pod,命令如下:

1
2
3
4
5
6
// 创建pod
kubectl create deployment nginx --image=nginx
// 等待pod运行起来了后,暴露其端口
kubectl expose deployment nginx --port=80 --type=NodePort
// 查看服务状态,结果如下图
kubectl get pods,svc -o wide
image-20210804105802347

我们可以看到nginx这个pod被分到了k8s-node01节点(本机10.10.10.101,k8s-node01为10.10.10.102)上,且暴露到外部的端口为32700,我们访问即可:

image-20210804105910221

出现的问题

kubeadm初始化 - failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0

问题:即拉不到镜像

这算是个小bug吧,我们用的是阿里云的源,阿里那边给这个镜像打的标签是1.8.0而不是v1.8.0所以拉不到,所以我们需要手动从阿里云拉取coredns(标签是:1.8.0),然后手动把标签改成v1.8.0,再次初始化就好了。

1
2
3
4
5
6
# 先拉取这个需要的镜像
docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
# 重新打tag
docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
# 删除原有镜像
docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0

kubectl get nodes报错The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题:不能使用Kubectl

解决方法:首先这个命令是在主节点上执行的,不是在Node节点上运行的。其次,注意要使用这个命令必须你的家目录下有.kube文件夹。用当时建立.kube的用户来执行这个命令。我们刚初始化完主节点的时候有一个提示:

1
2
3
4
5
To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

比如我用float用户建立的那就用这个用户执行这个命令。如果你用root的话可以执行这个命令:export KUBECONFIG=/etc/kubernetes/admin.conf,或者直接用root创建.kube就行了。

Flannel网络插件 - 安装Flannel网络时出现镜像不能成功拉取

kubectl apply -f kube-flannel.yml加载flannel模块后pod起不来的问题,可能出现ImagePullBackOff等错误

问题如图:

image-20210803183425153

这是因为默认镜像来源quay.io访问不到的问题,阿里云暂时也没有对kubernetes的组件做镜像,所以这里你只能去github手动下载然后docker load镜像,flannel地址https://github.com/flannel-io/flannel#flannel。导入结果如下:

image-20210803195123781

然后你可以手动把镜像拉取失败的pod删掉:kubectl delete pod -n kube-system xxxx,然后它会自动重建pod,然后等待初始化完成。

Flannel网络插件 - Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned

子节点不能获取pod CIDR地址

问题如下图:

image-20210804093745889

出现这个问题的原因是你的子节点没有收到podCIDR参数。要解决这个问题,主要从两方面进行排查,首先你自己要确保两方面:

  • kubeadm --init ...的时候带上了--pod-network-cidr这个参数,注意这个参数不能和你当前机器所在的LAN冲突
  • 其次在安装flannel网络插件的时候,配置文件要中更改网络信息中的网段,如下图。注意直接去apply raw,githubusercontent.com官方那个话,人家那个网段写的是10.244.0.0/16,所以你要么把你--pod-network-cidr改成10.244.0.0/16。要么你就把那个yml下载下来像我一样改成自己想要的。
image-20210804094134460

但是实际上,我上面两步都做了,但是依旧出现了这个问题,如果你也一样,那你可以使用下面的命令来解决这个问题:

kubectl patch node k8s-node01 -p '{"spec":{"podCIDR":"10.10.12.0/24"}}',成功结果如下:

image-20210804094610830

没有问题的话,你每一个节点应该都是可以看到如下信息的。但是这只是事后的解决办法,我还暂时不知道本质原因

image-20210804095517000

参考文章

环境搭建

k8s教程由浅入深-尚硅谷_哔哩哔哩_bilibili

问题排查

初始化 Kubernetes 主节点 failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0_a749227859的博客-CSDN博客

kubernetes - Flannel is crashing for Slave node - Stack Overflow

k8s集群flannel部署错误异常排查:pod cidr not assigned | 滩之南 (hyhblog.cn)

Powered By Valine
v1.5.2
CATALOG
  1. 1. 安装一个单Master节点的Kubernetes集群
  2. 2. 出现的问题
  3. 3. 参考文章