0%

使用 kubeadm 快速部署 k8s 集群

kubeadm 是一种工具,旨在为创建 Kubernetes 集群提供最佳实践的快速路径,它以用户友好的方式执行必要的操作,以使可以最低限度的可行,安全的启动并运行群集。只需将kubeadm,kubeletkubectl安装到服务器,其他核心组件以容器化方式快速部署。

前置准备

Letting iptables see bridged traffic

解决 iptables 而导致流量无法正确路由的问题

1
2
3
4
5
6
7
8
9
10
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system

Check required ports

Kubernetes 的 Master 组件和 Node 组件需要使用某些特定端口,使用 kubeadm 部署集群前需要放开 Node 上面的端口,具体如下所示:

  • Master 节点
Protocol Direction Port Range Purpose Used By
TCP Inbound 6443* Kubernetes API server All
TCP Inbound 2379-2380 etcd server client API kube-apiserver, etcd
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 10251 kube-scheduler Self
TCP Inbound 10252 kube-controller-manager Self
  • Worker 节点
Protocol Direction Port Range Purpose Used By
TCP Inbound 10250 Kubelet API Self, Control plane
TCP Inbound 30000-32767 NodePort Services All

Installing runtime

1
2
# yum -y install docker
# systemctl enable docker && systemctl start docker

Installing kubeadm, kubelet and kubectl

首先配置 yum 源,然后安装 kubeadm, kubelet, kubectl,设置 kubelet 开启启动。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF

# Set SELinux in permissive mode (effectively disabling it)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

sudo yum install kubelet-1.18.4 kubeadm-1.18.4 kubectl-1.18.4

sudo systemctl enable --now kubelet

关闭 swapoff:

1
2
sudo swapoff -a # 临时关闭swap,重启后失效
sudo sed -i '/ swap / s/^/#/' /etc/fstab # 系统级关闭swap,需要重启,重启后不失效

部署集群

配置 Master 节点

修改 kubelet 参数

1
2
3
4
vi /etc/sysconfig/kubelet

改为如下参数
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd

导出配置文件并修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# 导出配置文件
kubeadm config print init-defaults --kubeconfig ClusterConfiguration > /data/kubeadm/config/kubeadm.yml
# 修改配置文件
vim kubeadm.yml
# 修改内容如下
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
# 修改为主节点 IP
advertiseAddress: 192.168.1.102
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: localhost.localdomain
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager:
extraArgs:
horizontal-pod-autoscaler-use-rest-clients: "true"
horizontal-pod-autoscaler-sync-period: "10s"
node-monitor-grace-period: "10s"
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
# 修改registry为阿里云
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
# 修改Kubernetes版本号
kubernetesVersion: v1.18.0
networking:
dnsDomain: cluster.local
# 配置Calico 的默认网段
podSubnet: "172.16.0.0/16"
serviceSubnet: 10.96.0.0/12
scheduler: {}

配置 kubernetes master 节点

1
kubeadm init --config=/data/kubeadm/config/kubeadm.yml --upload-certs | tee /data/kubeadm/log/kubeadm-init.log
  • --upload-certs 参数:可以在后续执行加入节点时自动分发证书文件
  • tee kubeadm-init.log参数: 用以输出日志

执行init操作的时候可以看到日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
W0601 11:33:16.858211    1719 strict.go:47] unknown configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta2", Kind:"KubeProxyConfiguration"} for scheme definitions in "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31" and "k8s.io/kubernetes/cmd/kubeadm/app/componentconfigs/scheme.go:28"
W0601 11:33:16.858535 1719 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=KubeProxyConfiguration
[init] Using Kubernetes version: v1.18.0
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [localhost.localdomain kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.102]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [192.168.1.102 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [192.168.1.102 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
W0601 11:33:33.405533 1719 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0601 11:33:33.411476 1719 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 31.511863 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
ca23402e2e70c5613b2ee10507b6065a548bb715f992c335e6498f25d30c0f96
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.102:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2d14d0998d3d2921771e6c6a81477b5124d87f920b7c4caeec8ebefe3c94fe5b

执行过程关键内容:

1
2
3
4
5
6
7
8
9
10
11
[preflight] 运行一系列 preflight 检查
[kubelet-start] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”
[certificates] 生成相关的各种证书
[kubeconfig] 生成 KubeConfig 文件,存放在 /etc/kubernetes 目录中,组件之间通信需要使用对应文件
[control-plane] 使用 /etc/kubernetes/manifest 目录下的 YAML 文件,安装 Master 组件
[etcd] 使用 /etc/kubernetes/manifest/etcd.yaml 安装 Etcd 服务
[kubelet] 使用 configMap 配置 kubelet
[patchnode] 更新 CNI 信息到 Node 上,通过注释的方式记录
[mark-control-plane] 为当前节点打标签,打了角色 Master,和不可调度标签,默认就不会使用 Master 节点来运行 Pod
[bootstrap-token] 生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
[addons] 安装附加组件 CoreDNS 和 kube-proxy

配置Kubectl

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

验证

1
2
3
$ kubectl get node
NAME STATUS ROLES AGE VERSION
vm-0-11-centos NotReady master 3m14s v1.18.2

配置 Worker 节点

修改 kubelet 参数

1
2
3
4
vi /etc/sysconfig/kubelet

改为如下参数
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd

使用 kubeadm join 命令将 Worker 节点加入到 k8s 集群

配置 k8s worker 节点

1
2
$ kubeadm join 192.168.1.102:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2d14d0998d3d2921771e6c6a81477b5124d87f920b7c4caeec8ebefe3c94fe5b

验证

1
2
3
4
$ kubectl get node
NAME STATUS ROLES AGE VERSION
vm-0-11-centos NotReady master 7m51s v1.18.2
vm-0-15-centos NotReady <none> 3m v1.18.2

Kubernetes的 Node 节点上执行 kubectl命令出现错误:

1
The connection to the server localhost:8080 was refused - did you specify the right host or port?

出现这个问题的原因是kubectl命令需要使用 kubernetes-admin 来运行,需要将主节点中的/etc/kubernetes/admin.conf文件拷贝到从节点相同目录下,然后配置环境变量。

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

配置网络插件

这里选择安装 calico 作为网络插件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created

Calico 启动之后,可以看到 Node 都处于 Ready 状态

1
2
3
4
$ kubectl get node
NAME STATUS ROLES AGE VERSION
vm-0-11-centos Ready master 13m v1.18.2
vm-0-15-centos Ready <none> 8m12s v1.18.2

问题排查

在 kubeadm 部署的时候,可能会碰到下面的问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.


Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- No internet connection is available so the kubelet cannot pull or find the following control plane images:
- k8s.gcr.io/kube-apiserver-amd64:v1.11.2
- k8s.gcr.io/kube-controller-manager-amd64:v1.11.2
- k8s.gcr.io/kube-scheduler-amd64:v1.11.2
- k8s.gcr.io/etcd-amd64:3.2.18
- You can check or miligate this in beforehand with "kubeadm config images pull" to make sure the images
are downloaded locally and cached.

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster

排查步骤如下:

  • 查看 kubelet 是否正常运行,正如日志所说
1
2
3
4
5
6
$ systemctl status kubelet
$ journalctl -u kubelet
Mar 08 11:00:39 VM-0-14-centos kubelet[15759]: F0308 11:00:39.379649 15759 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Mar 08 11:00:39 VM-0-14-centos systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 08 11:00:39 VM-0-14-centos systemd[1]: Unit kubelet.service entered failed state.
Mar 08 11:00:39 VM-0-14-centos systemd[1]: kubelet.service failed.

这里报错提示 misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs",这是什么意思呢?

简单来说,kubelet 的 croup driver 是 systemd,但是 docker 的 cgroup driver 是 cgroup driver,查看 kubeadm init 的日志,可以看到:

1
2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

那么我们将 docker 的 cgroup driver 修改为 systemd 即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"data-root": "/data/docker"
}
EOF
systemctl daemon-reload
systemctl restart docker
  • 检查节点上 10248 端口是否放开
  • 检查节点上 swap 是否禁用

参考资料