The Issue
Someone has deleted the Cloud-Provider-vSphere project in the gcr.io registry for container images. The default pull policy for the vSphere CSI when using VMware’s manifests is set to always, meaning that if you reboot your cluster, it will not come back online.
This is what my cluster looked like when I booted it up today;
❯ kubectl get pods -n vmware-system-csi NAME READY STATUS RESTARTS AGE vsphere-csi-controller-776fb75cd8-ptw4s 5/7 ErrImagePull 0 84m vsphere-csi-controller-776fb75cd8-qt7kv 5/7 ImagePullBackOff 0 84m vsphere-csi-controller-776fb75cd8-s7btf 5/7 ImagePullBackOff 0 84m vsphere-csi-node-5qjjw 1/3 CrashLoopBackOff 80 (111s ago) 142d vsphere-csi-node-fmdkz 2/3 ImagePullBackOff 84 (3m5s ago) 143d vsphere-csi-node-gbt9w 1/3 CrashLoopBackOff 6 (26s ago) 5m56s vsphere-csi-node-jkj98 1/3 CrashLoopBackOff 86 (24s ago) 143d vsphere-csi-node-r69bl 1/3 CrashLoopBackOff 85 (102s ago) 143d vsphere-csi-node-ww2zx 2/3 ImagePullBackOff 89 (3m5s ago) 143d
And when describing the pod;
❯ kubectl describe pod -n vmware-system-csi vsphere-csi-controller-776fb75cd8-ptw4s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 85m default-scheduler 0/6 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 4 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 84m default-scheduler 0/6 nodes are available: 6 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 6m54s default-scheduler 0/6 nodes are available: 6 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Normal Scheduled 6m27s default-scheduler Successfully assigned vmware-system-csi/vsphere-csi-controller-776fb75cd8-ptw4s to talos-2tp-6ld Normal Created 6m26s kubelet Created container liveness-probe Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fsyncer%3Apull&service=gcr.io: 401 Unauthorized Normal Started 6m26s kubelet Started container csi-attacher Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-resizer:v1.7.0" already present on machine Normal Created 6m26s kubelet Created container csi-resizer Normal Started 6m26s kubelet Started container csi-resizer Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0" Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fdriver%3Apull&service=gcr.io: 401 Unauthorized Warning Failed 6m26s kubelet Error: ErrImagePull Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.9.0" already present on machine Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-attacher:v4.2.0" already present on machine Normal Started 6m26s kubelet Started container liveness-probe Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0" Normal Created 6m26s kubelet Created container csi-attacher Warning Failed 6m26s kubelet Error: ErrImagePull Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-provisioner:v3.4.0" already present on machine Normal Created 6m26s kubelet Created container csi-provisioner Normal Started 6m25s kubelet Started container csi-provisioner Normal Pulled 6m25s kubelet Container image "k8s.gcr.io/sig-storage/csi-snapshotter:v6.2.1" already present on machine Normal Created 6m25s kubelet Created container csi-snapshotter Normal Started 6m25s kubelet Started container csi-snapshotter Warning Failed 6m24s kubelet Error: ImagePullBackOff Normal BackOff 6m24s kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0" Warning Failed 6m24s kubelet Error: ImagePullBackOff Normal BackOff 83s (x21 over 6m24s) kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
The Cause
Who knows? Maybe it cost Broadcom too much to host the images in Google Cloud. Or maybe they are moving to a model where you can only access the files when you pay for VCF.
The Workaround
Luckily the images are mirrored by Rancher, so I just updated the vSphere CSI manifest from:
– https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml
And updated the image locations, you can get the updated file from my GitHub Gist belo.
Regards
just solved it thanks to your blog.
Thank you bro
Thanks so much, a big help!