전체 목차

 

매번 시스템을 부팅시 워커노드를 활성화(Uncordon)하고, 종료시 배출(Drain)하는 명령을 기억하고 CLI 환경에서 타이핑 하는 것이 귀찮을 수 있습니다.

순서가 바뀌거나 빠뜨리지 않도록 스크립트로 작성해보겠습니다.

1. 쿠버네티스 안전한 시작

1.1. 노드별 전원 On

master -> node1 -> node2 -> node3 순으로 전원을 켜고 부팅이 완료 되기를 기다립니다.

 

1.2. 상태 점검 (마스터 노드)

Swap 영역 확인

swapon -s 명령에 아무 내용이 없거나

free -h 명령으로 Swap 영역이 0Byte로 표시되어야 합니다.

$ swapon -s
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi       1.3Gi        29Gi       2.7Mi       980Mi        29Gi
Swap:             0B          0B          0B
$

 

containerd, kubelet 데몬 확인

부팅 후 자동으로 시작된 containerd, kubelet 데몬의 상태를 확인해 보면

$ sudo systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; preset: enabled)
     Active: active (running) since Tue 2026-01-20 08:05:41 UTC; 7min ago
       Docs: https://containerd.io
    Process: 1089 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 1105 (containerd)
      Tasks: 136
     Memory: 158.2M (peak: 162.8M)
        CPU: 6.464s
     CGroup: /system.slice/containerd.service
             ├─1105 /usr/bin/containerd
             ├─1362 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 6d17cd383b412730636cced803a81e3a31ed99c913dc7eacf964e353160b86f3 -address /run/containerd/containerd.sock
             ├─1363 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 9135f53cf2636a376c33537f0468c68a8b6f032d95d0640a8e01c0b068f255db -address /run/containerd/containerd.sock
             ├─1364 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 7edd6ac44ed400d7cd64af5998a1ecb4c42a0049ff6d693e13d55732e3d96ef8 -address /run/containerd/containerd.sock
             ├─1367 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 084495e58ac6e5eb15dcdfa8c5b29784309af4958ac116a2e8d2e10f8036a345 -address /run/containerd/containerd.sock
             ├─1828 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 369cd6eb1c668292992e14077a823f2363751e8250dcf9f0c9c1d9301d24d2c1 -address /run/containerd/containerd.sock
             ├─1914 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 0c4e9db44fa94dd5046a2b4e326a1b1bb58adee48520eec3d68f688552df6849 -address /run/containerd/containerd.sock
             ├─2473 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id f4e905a1749b359f7214dfdb9e62d4a42360aabddef1f6bbd5795980d7f10d23 -address /run/containerd/containerd.sock
             └─2626 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 737fcb7dad8799b446d4ea37914e9a060caa5e5b4a6eeefbdd3d6e837048e73c -address /run/containerd/containerd.sock

Jan 20 08:05:52 master containerd[1105]: time="2026-01-20T08:05:52.900054817Z" level=info msg="TearDown network for sandbox \"0d6483193c1b4e5530bddbc6a7ac2573bec19f7f3c3bb4806876d3fd300b657d\" successfully"
Jan 20 08:05:52 master containerd[1105]: time="2026-01-20T08:05:52.900094956Z" level=info msg="StopPodSandbox for \"0d6483193c1b4e5530bddbc6a7ac2573bec19f7f3c3bb4806876d3fd300b657d\" returns successfully"
Jan 20 08:05:52 master containerd[1105]: time="2026-01-20T08:05:52.900944681Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:coredns-55cb58b774-h2sqz,Uid:d4c804a9-8b12-4213-bd88-74616655cbad,Namespace:kube-system,Attempt:3,}"
Jan 20 08:05:52 master containerd[1105]: map[string]interface {}{"cniVersion":"0.3.1", "hairpinMode":true, "ipMasq":false, "ipam":map[string]interface {}{"ranges":[][]map[string]interface {}{[]map[string]interface {}{map[string]interface {}{"subnet":"10.244.0.0/24"}}}, "routes":[]types.Route{types.Route{Dst:net.IPNet{IP:net.IP{0xa, 0xf4, 0x0, 0x0}, Mask:net.IPMask{0xff, 0xff, 0x0, 0x0}}, GW:net.IP(nil), MTU:0, AdvMSS:0, Priority:0, Table:(*int)(nil), Scope:(*int)(nil)}}, "type":"host-local"}, "isDefaultGateway":true, "isGateway":true, "mtu":(*uint)(0xc000118700), "name":"cbr0", "type":"bridge"}
Jan 20 08:05:52 master containerd[1105]: delegateAdd: netconf sent to delegate plugin:
Jan 20 08:05:53 master containerd[1105]: {"cniVersion":"0.3.1","hairpinMode":true,"ipMasq":false,"ipam":{"ranges":[[{"subnet":"10.244.0.0/24"}]],"routes":[{"dst":"10.244.0.0/16"}],"type":"host-local"},"isDefaultGateway":true,"isGateway":true,"mtu":1450,"name":"cbr0","type":"bridge"}time="2026-01-20T08:05:53.027132750Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:coredns-55cb58b774-h2sqz,Uid:d4c804a9-8b12-4213-bd88-74616655cbad,Namespace:kube-system,Attempt:3,} returns sandbox id \"737fcb7dad8799b446d4ea37914e9a060caa5e5b4a6eeefbdd3d6e837048e73c\""
Jan 20 08:05:53 master containerd[1105]: time="2026-01-20T08:05:53.029813724Z" level=info msg="CreateContainer within sandbox \"737fcb7dad8799b446d4ea37914e9a060caa5e5b4a6eeefbdd3d6e837048e73c\" for container &ContainerMetadata{Name:coredns,Attempt:3,}"
Jan 20 08:05:53 master containerd[1105]: time="2026-01-20T08:05:53.045574018Z" level=info msg="CreateContainer within sandbox \"737fcb7dad8799b446d4ea37914e9a060caa5e5b4a6eeefbdd3d6e837048e73c\" for &ContainerMetadata{Name:coredns,Attempt:3,} returns container id \"e54956b2f417baba01b792b4772550dea72be3b54dd1e7425fd55e7a14f54e8e\""
Jan 20 08:05:53 master containerd[1105]: time="2026-01-20T08:05:53.045955251Z" level=info msg="StartContainer for \"e54956b2f417baba01b792b4772550dea72be3b54dd1e7425fd55e7a14f54e8e\""
Jan 20 08:05:53 master containerd[1105]: time="2026-01-20T08:05:53.114712091Z" level=info msg="StartContainer for \"e54956b2f417baba01b792b4772550dea72be3b54dd1e7425fd55e7a14f54e8e\" returns successfully"
$

containerd 데몬은 정상이지만

$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Tue 2026-01-20 08:05:44 UTC; 7min ago
       Docs: https://kubernetes.io/docs/
   Main PID: 1210 (kubelet)
      Tasks: 36 (limit: 38292)
     Memory: 111.7M (peak: 116.9M)
        CPU: 11.743s
     CGroup: /system.slice/kubelet.service
             └─1210 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9

Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.111762    1210 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"73606f772024898733e30e6ba5de57040e5d4dd0dc94cd8aca5ecb9636e4e050\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node."
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.111865    1210 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"73606f772024898733e30e6ba5de57040e5d4dd0dc94cd8aca5ecb9636e4e050\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node." pod="kube-system/coredns-55cb58b774-h2sqz"
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.111911    1210 kuberuntime_manager.go:1168] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"73606f772024898733e30e6ba5de57040e5d4dd0dc94cd8aca5ecb9636e4e050\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node." pod="kube-system/coredns-55cb58b774-h2sqz"
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.112012    1210 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coredns-55cb58b774-h2sqz_kube-system(d4c804a9-8b12-4213-bd88-74616655cbad)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-55cb58b774-h2sqz_kube-system(d4c804a9-8b12-4213-bd88-74616655cbad)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"73606f772024898733e30e6ba5de57040e5d4dd0dc94cd8aca5ecb9636e4e050\\\": plugin type=\\\"flannel\\\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.\"" pod="kube-system/coredns-55cb58b774-h2sqz" podUID="d4c804a9-8b12-4213-bd88-74616655cbad"
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.114614    1210 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"366a2a201e3287ed27e7fb43e3454c3e77543047ce884c77318a7c8e47b7270b\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node."
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.114662    1210 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"366a2a201e3287ed27e7fb43e3454c3e77543047ce884c77318a7c8e47b7270b\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node." pod="kube-system/coredns-55cb58b774-g7f5v"
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.114685    1210 kuberuntime_manager.go:1168] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"366a2a201e3287ed27e7fb43e3454c3e77543047ce884c77318a7c8e47b7270b\": plugin type=\"flannel\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node." pod="kube-system/coredns-55cb58b774-g7f5v"
Jan 20 08:05:49 master kubelet[1210]: E0120 08:05:49.114723    1210 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coredns-55cb58b774-g7f5v_kube-system(6fd0b849-2822-4bda-bb85-208605b1e865)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-55cb58b774-g7f5v_kube-system(6fd0b849-2822-4bda-bb85-208605b1e865)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"366a2a201e3287ed27e7fb43e3454c3e77543047ce884c77318a7c8e47b7270b\\\": plugin type=\\\"flannel\\\" failed (add): failed to load flannel 'subnet.env' file: open /run/flannel/subnet.env: no such file or directory. Check the flannel pod log for this node.\"" pod="kube-system/coredns-55cb58b774-g7f5v" podUID="6fd0b849-2822-4bda-bb85-208605b1e865"
Jan 20 08:05:50 master kubelet[1210]: I0120 08:05:50.783168    1210 scope.go:117] "RemoveContainer" containerID="ab8ea4de87f63e4c747add2c1b90bd741153e370a30f58decef2a5dd5009dd57"
Jan 20 08:05:54 master kubelet[1210]: I0120 08:05:54.903327    1210 prober_manager.go:312] "Failed to trigger a manual run" probe="Readiness"
$

 

kubelet이  subnet.env 파일을 찾지 못해 에러를 내고 있는 상황입니다.

kubelet 데몬만 수동으로 재시작 하면 정상 상태로 돌아갑니다.

$ sudo systemctl restart kubelet

$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Tue 2026-01-20 08:13:22 UTC; 5s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 4956 (kubelet)
      Tasks: 35 (limit: 38292)
     Memory: 31.0M (peak: 33.9M)
        CPU: 747ms
     CGroup: /system.slice/kubelet.service
             └─4956 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9

Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.122769    4956 reconciler_common.go:247] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/cd4d4da2-0f3e-412a-9372-b89d86d2c5da-xtables-lock\") pod \"kube-proxy-vcp94\" (UID: \"cd4d4da2-0f3e-412a-9372-b89d86d2c5da\") " pod="kube-system/kube-proxy-vcp94"
Jan 20 08:13:23 master kubelet[4956]: E0120 08:13:23.152737    4956 kubelet.go:1937] "Failed creating a mirror pod for" err="pods \"etcd-master\" already exists" pod="kube-system/etcd-master"
Jan 20 08:13:23 master kubelet[4956]: E0120 08:13:23.152785    4956 kubelet.go:1937] "Failed creating a mirror pod for" err="pods \"kube-apiserver-master\" already exists" pod="kube-system/kube-apiserver-master"
Jan 20 08:13:23 master kubelet[4956]: E0120 08:13:23.152770    4956 kubelet.go:1937] "Failed creating a mirror pod for" err="pods \"kube-scheduler-master\" already exists" pod="kube-system/kube-scheduler-master"
Jan 20 08:13:23 master kubelet[4956]: E0120 08:13:23.153217    4956 kubelet.go:1937] "Failed creating a mirror pod for" err="pods \"kube-controller-manager-master\" already exists" pod="kube-system/kube-controller-manager-master"
Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.206932    4956 desired_state_of_world_populator.go:158] "Finished populating initial desired state of world"
Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.223473    4956 reconciler_common.go:247] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni\" (UniqueName: \"kubernetes.io/host-path/217715c2-0862-4aee-acb1-0386ecc35a3e-cni\") pod \"kube-flannel-ds-s8tg8\" (UID: \"217715c2-0862-4aee-acb1-0386ecc35a3e\") " pod="kube-flannel/kube-flannel-ds-s8tg8"
Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.223553    4956 reconciler_common.go:247] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-plugin\" (UniqueName: \"kubernetes.io/host-path/217715c2-0862-4aee-acb1-0386ecc35a3e-cni-plugin\") pod \"kube-flannel-ds-s8tg8\" (UID: \"217715c2-0862-4aee-acb1-0386ecc35a3e\") " pod="kube-flannel/kube-flannel-ds-s8tg8"
Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.223775    4956 reconciler_common.go:247] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/217715c2-0862-4aee-acb1-0386ecc35a3e-xtables-lock\") pod \"kube-flannel-ds-s8tg8\" (UID: \"217715c2-0862-4aee-acb1-0386ecc35a3e\") " pod="kube-flannel/kube-flannel-ds-s8tg8"
Jan 20 08:13:23 master kubelet[4956]: I0120 08:13:23.223836    4956 reconciler_common.go:247] "operationExecutor.VerifyControllerAttachedVolume started for volume \"run\" (UniqueName: \"kubernetes.io/host-path/217715c2-0862-4aee-acb1-0386ecc35a3e-run\") pod \"kube-flannel-ds-s8tg8\" (UID: \"217715c2-0862-4aee-acb1-0386ecc35a3e\") " pod="kube-flannel/kube-flannel-ds-s8tg8"
$

 

subnet 파일은 /run/flannel/subnet.env 에 존재합니다.

/run~ 하위의 디렉토리 들은 OS 특성상 시스템이 가동되는 동안만 유지되며 종료 되면 삭제되는 휘발성 속성을 가지고 있습니다.

문제는 해당 파일이 kubelet 데몬이 가동되면서 생성되지만 시점 차이로 처음 시작 할 때 파일이 생성되기전에 해당 파일을 참조 하려고 하기 때문에 발생되는 문제 입니다.

 

결국 시스템이 부팅 되면서 kubelet 데몬이 자동 시작 되면서 subnet.env 파일이 생성된 후 kubelet 데몬을 재시작 해야 오류가 사라집니다. (자동 실헹에서 제외 하고 부팅 후 수동 재시작 해도 여전히 subnet.env 파일을 찾지 못하는 오류는 사라지지 않음)

 

결론: 부팅 후 데몬을 재시작 해주는 것이 가장 쉽고 빠른 처리 방법 일 것 입니다.

 

1.3. 안전한 실행 스크립트

마스터 노드의 부팅이 완료 되면 쿠버네티스 서비스를 시작하는 전략 입니다.

  • Swap 노드 비활성화
  • 서비스 재시작 (containerd, kubelet)
  • 워커 노드 복구 (Uncordon)

k8s-startup.sh 파일 작성

#!/bin/bash
# k8s-startup.sh

echo "1. 마스터 노드 Swap 비활성화 점검..."
sudo swapoff -a

echo "2. 컨테이너 서비스 상태 점검 및 재시작..."
sudo systemctl restart containerd
# kubelet 재시작 전 containerd가 완전히 준비될 시간을 잠시 줍니다.
sleep 2

# 파일이 생성된 후 kubelet이 이를 확실히 인식하도록 한 번 더 재시작해주는 것이 안전합니다.
if [ -f /run/flannel/subnet.env ]; then
    echo "네트워크 파일 확인 완료. 서비스 최종 동기화..."
    sudo systemctl restart kubelet
    sleep 10
else
    echo "--------------------------------------------------------"
    echo " 오류: 네트워크 파일(/run/flannel/subnet.env) 생성 실패!"
    echo " flannel 포드 상태를 확인하세요: kubectl get pods -n kube-flannel"
    echo "--------------------------------------------------------"
    exit 1 # 종료
fi

echo "4. 워커 노드 스케줄링 활성화 (Uncordon)..."
# 노드가 Ready 상태인지 확인 후 Uncordon을 수행하는 것이 정석입니다.
for node in node1 node2 node3; do
    echo "Uncordoning $node..."
    kubectl uncordon $node
done

echo "5. 최종 상태 확인..."
# 오타 수정: -A wide -> -A -o wide
kubectl get nodes -o wide
kubectl get pods -A -o wide

 

실행 권한 부여

$ chmod u+x k8s-startup.sh

 

1.4. 스크립트 실행

스크립트를 저장한 위치에서 실행합니다.

$ ./k8s-startup.sh

2. 쿠버네티스 안전한 종료

2.1. 안전한 종료 스크립트

종료 시에는 워커노드만 제거 하면 됩니다.

k8s-shutdown.sh 파일 작성

#!/bin/bash
# k8s-shutdown.sh (Manual Power-off version)

echo "--- 클러스터 서비스 안전 정지 시퀀스 시작 ---"

# 1. 워커 노드 배출 (Drain)
# 이 과정이 끝나면 모든 Pod이 안전하게 종료됩니다.
echo "[1/2] 워커 노드(node1~3) 서비스 중단 및 배출 시작..."
for node in node1 node2 node3; do
    echo "  -> Draining $node..."
    # Daemonset(네트워크 등)은 유지하고 일반 앱 Pod만 안전하게 종료합니다.
    kubectl drain $node --ignore-daemonsets --delete-emptydir-data --force --timeout=60s
done

# 2. 시스템 상태 확인
echo "[2/2] 노드 상태 확인 중..."
kubectl get nodes

echo ""
echo "--------------------------------------------------------"
echo " 모든 서비스가 안전하게 배출되어 'SchedulingDisabled' 상태 임을 확인하세요."
echo " 이제 아래 순서로 전원을 직접 끄셔도 됩니다:"
echo " 1. 각 워커 노드(Raspberry Pi 1, 2, 3) 전원 종료"
echo " 2. 마스터 노드(Xeon 서버) 전원 종료"
echo "--------------------------------------------------------"

 

실행 권한 부여

$ chmod u+x k8s-shutdown.sh

 

2.2. 종료 스크립트 실행

스크립트를 저장한 위치에서 실행합니다.

$ ./k8s-shutdown.sh

3. 전원 종료

node1 -> node2 -> node3 -> master 순으로 shutdown 명령을 실행합니다.

$ sudo shutdown -h now

 

모든 시스템이 안정적으로 종료 된 것을 확인한 후 물리적 전원 스위치를 off  합니다.

 

끝.

+ Recent posts