Category Archives: Kubernetes

Kubernetes Header Image

Kubernetes Finalizers: Deep Dive into PVC Deletion

This article builds upon an old post I wrote many years ago: 
Kubernetes PVC stuck in Terminating state
. That post covered the symptoms and quick fixes.

This one is for platform engineers and Kubernetes operators who want to understand why resources like PVCs get stuck in Terminating, how Kubernetes handles deletion internally, and what it really means when a finalizer hangs around.

What Are Finalizers and Why Do They Matter?

In Kubernetes, deleting a resource is a two-phase operation. When a user runs
kubectl delete, the object is not immediately removed from etcd. Instead, Kubernetes sets a deletionTimestamp and, if finalizers are present, waits for them to be cleared before actually removing the resource from the API server.

Finalizers are strings listed in the metadata.finalizers array. Each one signals that a
controller must perform cleanup logic before the object can be deleted. This ensures consistency and is critical when external resources (cloud volumes, DNS records, firewall rules) are involved.

metadata:
  finalizers:
    - example.com/cleanup-hook

Until this list is empty, Kubernetes will not fully delete the object. This behavior is central to the garbage collection process and the reliability of resource teardown.

Deletion Flow Internals

Here’s what actually happens under the hood: Continue reading Kubernetes Finalizers: Deep Dive into PVC Deletion

Kubernetes Header Image

Kubernetes Scheduling: nodeSelector vs nodeAffinity

When deploying workloads in Kubernetes, controlling where your pods land is crucial. Two primary mechanisms facilitate this: nodeSelector and nodeAffinity. While they might seem similar at first glance, they serve different purposes and offer varying degrees of flexibility.

The Basics: nodeSelector

The nodeSelector is the simplest way to constrain pods to specific nodes. It matches pods to nodes based on key-value pairs. For instance:

spec:
  nodeSelector:
    disktype: ssd

This configuration ensures that the pod is scheduled only on nodes labeled with disktype=ssd.

However, nodeSelector has its limitations. It doesn’t support complex queries or multiple values for a single key. If you attempt to specify multiple values for the same key, like so:

nodeSelector:
  topology.kubernetes.io/zone: us-east-1a
  topology.kubernetes.io/zone: us-east-1b

Only the last key-value pair is considered, effectively ignoring the previous ones. This behavior stems from the fact that YAML maps require unique keys, and Kubernetes doesn’t merge these entries.

Enter nodeAffinity

For more granular control, nodeAffinity comes into play. It allows you to define rules using operators like In, NotIn, Exists, and DoesNotExist. This flexibility enables you to match pods to nodes based on a range of criteria.

Suppose you want to schedule a pod on nodes in either us-east-1a or us-east-1b. Here’s how you’d achieve that with nodeAffinity: Continue reading Kubernetes Scheduling: nodeSelector vs nodeAffinity

Kubernetes Header Image

Highlight Kubernetes Labels in your Terminal with AWK

A quick tip and bit of code: if you’re outputting a lot of Kubernetes metadata using the --show-labels command, it can feel like looking for a needle in a haystack. The snippet below colorizes key label outputs to make them stand out.

The Code Snippet

When working with Kubernetes, it can be helpful to visually scan for certain node labels—such as service.cilium.io/node=... or custom readiness flags like ingress-ready=true. Using a simple awk script, we can colorize these labels directly in our terminal output. This script uses ANSI escape codes to wrap matched text in color and awk’s gsub() function to apply substitutions line by line. It’s a lightweight and effective way to highlight key data points in otherwise dense CLI output.

kubectl get ciliumnodes --show-labels | awk '
BEGIN {
  color_start = "\033[1;36m"; # cyan
  color_end = "\033[0m";
}
{
  gsub(/service\.cilium\.io\/node=[^, ]+/, color_start "&" color_end);
  gsub(/ingress-ready=true/, color_start "&" color_end);
  print
}'

Screenshot Example


Screenshot showing the use of an awk command to color-highlight the ingress-ready=true label in red within kubectl get ciliumnodes --show-labels output in a Kubernetes terminal session.

Breakdown of the Code

We pipe the output of the kubectl command to awk. The BEGIN block sets up the ANSI color codes used for matching patterns.

  • \033[1;36m is an ANSI escape code that starts cyan-colored text.
  • \033[0m resets the text color back to normal.

gsub(...)

These two lines apply substitutions to each input line:

  • gsub() is a global substitution function that replaces all matches in the line.
    • service\.cilium\.io\/node=[^, ]+ matches a full key-value pair like service.cilium.io/node=mynode
    • [^, ]+ grabs the node value until the next comma or space
    • ingress-ready=true matches the exact label string
    • & refers to the entire matched string, which we wrap in color codes

print

This prints the modified line after substitutions are applied.

Customize the Highlight Color

You can change \033[1;36m to another color code:

  • Red: \033[1;31m
  • Green: \033[1;32m
  • Yellow: \033[1;33m
  • Blue: \033[1;34m
  • Magenta: \033[1;35m

A Final Note on sub() vs gsub()

  • sub() replaces only the first occurrence of the regex in the line
  • gsub() replaces all occurrences of the regex in the line

Regards


Bluesky Icon
Follow me on Bluesky

Dean Lewis

Kubernetes

vSphere CSI Driver Images unable from gcr.io – quick fix

The Issue

Someone has deleted the Cloud-Provider-vSphere project in the gcr.io registry for container images. The default pull policy for the vSphere CSI when using VMware’s manifests is set to always, meaning that if you reboot your cluster, it will not come back online.

vSphere-CSI Driver image unable - project deleted

This is what my cluster looked like when I booted it up today;

❯ kubectl get pods -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
vsphere-csi-controller-776fb75cd8-ptw4s 5/7 ErrImagePull 0 84m
vsphere-csi-controller-776fb75cd8-qt7kv 5/7 ImagePullBackOff 0 84m
vsphere-csi-controller-776fb75cd8-s7btf 5/7 ImagePullBackOff 0 84m
vsphere-csi-node-5qjjw 1/3 CrashLoopBackOff 80 (111s ago) 142d
vsphere-csi-node-fmdkz 2/3 ImagePullBackOff 84 (3m5s ago) 143d
vsphere-csi-node-gbt9w 1/3 CrashLoopBackOff 6 (26s ago) 5m56s
vsphere-csi-node-jkj98 1/3 CrashLoopBackOff 86 (24s ago) 143d
vsphere-csi-node-r69bl 1/3 CrashLoopBackOff 85 (102s ago) 143d
vsphere-csi-node-ww2zx 2/3 ImagePullBackOff 89 (3m5s ago) 143d

And when describing the pod;

❯ kubectl describe pod -n vmware-system-csi vsphere-csi-controller-776fb75cd8-ptw4s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 85m default-scheduler 0/6 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 4 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 84m default-scheduler 0/6 nodes are available: 6 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 6m54s default-scheduler 0/6 nodes are available: 6 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Normal Scheduled 6m27s default-scheduler Successfully assigned vmware-system-csi/vsphere-csi-controller-776fb75cd8-ptw4s to talos-2tp-6ld
Normal Created 6m26s kubelet Created container liveness-probe
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fsyncer%3Apull&service=gcr.io: 401 Unauthorized
Normal Started 6m26s kubelet Started container csi-attacher
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-resizer:v1.7.0" already present on machine
Normal Created 6m26s kubelet Created container csi-resizer
Normal Started 6m26s kubelet Started container csi-resizer
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fdriver%3Apull&service=gcr.io: 401 Unauthorized
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.9.0" already present on machine
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-attacher:v4.2.0" already present on machine
Normal Started 6m26s kubelet Started container liveness-probe
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Normal Created 6m26s kubelet Created container csi-attacher
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-provisioner:v3.4.0" already present on machine
Normal Created 6m26s kubelet Created container csi-provisioner
Normal Started 6m25s kubelet Started container csi-provisioner
Normal Pulled 6m25s kubelet Container image "k8s.gcr.io/sig-storage/csi-snapshotter:v6.2.1" already present on machine
Normal Created 6m25s kubelet Created container csi-snapshotter
Normal Started 6m25s kubelet Started container csi-snapshotter
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 6m24s kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 83s (x21 over 6m24s) kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
The Cause

Who knows? Maybe it cost Broadcom too much to host the images in Google Cloud. Or maybe they are moving to a model where you can only access the files when you pay for VCF.

The Workaround

Luckily the images are mirrored by Rancher, so I just updated the vSphere CSI manifest from:

– https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml

And updated the image locations, you can get the updated file from my GitHub Gist belo. Continue reading vSphere CSI Driver Images unable from gcr.io – quick fix

Cilium Event Types Header

Understanding cilium_event_type when using Cilium & Hubble

The Issue

In a platform that’s deployed with Cilium, when using Hubble either to view the full JSON output or to configure which events are captured using the allowlist or denylist you may have seen a field called event_type which uses an integer.

Below is an example allow list using “event_type”, to define which flows to be captured. When I first saw this, I was confused; where do these numbers come from? How do I map this back to a friendly name that I understand?;

allowlist:
- '{"source_pod":["kube-system/"],"event_type":[{"type":1}]}'
- '{"destination_pod":["kube-system/"],"event_type":[{"type":1}]}'

Example Hubble Dynamic Exporter configuration;

hubble:
  export:
    dynamic:
      enabled: true
      config:
        enabled: true
        content:
        - name: "test001"
          filePath: "/var/run/cilium/hubble/test001.log"
          fieldMask: []
          includeFilters: []
          excludeFilters: []
          end: "2023-10-09T23:59:59-07:00"
        - name: "test002"
          filePath: "/var/run/cilium/hubble/test002.log"
          fieldMask: ["source.namespace", "source.pod_name", "destination.namespace", "destination.pod_name", "verdict"]
          includeFilters:
          - source_pod: ["default/"]
            event_type:
            - type: 1
          - destination_pod: ["frontend/webserver-975996d4c-7hhgt"]

and finally, a Hubble flow in full JSON output, with the event_type showing towards the end of the output;

{
  "flow": {
    "time": "2024-07-08T10:09:24.173232166Z",
    "uuid": "755b0203-d456-452d-b399-4fa136cdb4fd",
    "verdict": "FORWARDED",
    "ethernet": {
      "source": "06:29:73:4e:0a:c5",
      "destination": "26:50:d8:4a:94:d2"
    },
    "IP": {
      "source": "10.0.2.163",
      "destination": "130.211.198.204",
      "ipVersion": "IPv4"
    },
    "l4": {
      "TCP": {
        "source_port": 37736,
        "destination_port": 443,
        "flags": {
          "PSH": true,
          "ACK": true
        }
      }
    },
    "source": {
      "ID": 2045,
      "identity": 14398,
      "namespace": "endor",
      "labels": [
        "k8s:app.kubernetes.io/name=tiefighter"
      ],
      "pod_name": "tiefighter-6b56bdc869-2t6wn",
      "workloads": [
        {
          "name": "tiefighter",
          "kind": "Deployment"
        }
      ]
    },
    "destination": {
      "identity": 16777217,
      "labels": [
        "cidr:130.211.198.204/32",
        "reserved:world"
      ]
    },
    "Type": "L3_L4",
    "node_name": "kind-worker",
    "destination_names": [
      "disney.com"
    ],
    "event_type": {
      "type": 4,
      "sub_type": 3
    },
    "traffic_direction": "EGRESS",
    "trace_observation_point": "TO_STACK",
    "is_reply": false,
    "Summary": "TCP Flags: ACK, PSH"
  },
  "node_name": "kind-worker",
  "time": "2024-07-08T10:09:24.173232166Z"
}
The Explanation

Cilium Event types are defined in this Go package. The first line iota == 0 then increments by one for each type, so drop =1, debug =2, etc.

const (
	// 0-128 are reserved for BPF datapath events
	MessageTypeUnspec = iota

	// MessageTypeDrop is a BPF datapath notification carrying a DropNotify
	// which corresponds to drop_notify defined in bpf/lib/drop.h
	MessageTypeDrop

	// MessageTypeDebug is a BPF datapath notification carrying a DebugMsg
	// which corresponds to debug_msg defined in bpf/lib/dbg.h
	MessageTypeDebug

	// MessageTypeCapture is a BPF datapath notification carrying a DebugCapture
	// which corresponds to debug_capture_msg defined in bpf/lib/dbg.h
	MessageTypeCapture

	// MessageTypeTrace is a BPF datapath notification carrying a TraceNotify
	// which corresponds to trace_notify defined in bpf/lib/trace.h
	MessageTypeTrace

	// MessageTypePolicyVerdict is a BPF datapath notification carrying a PolicyVerdictNotify
	// which corresponds to policy_verdict_notify defined in bpf/lib/policy_log.h
	MessageTypePolicyVerdict

	// MessageTypeRecCapture is a BPF datapath notification carrying a RecorderCapture
	// which corresponds to capture_msg defined in bpf/lib/pcap.h
	MessageTypeRecCapture

	// MessageTypeTraceSock is a BPF datapath notification carrying a TraceNotifySock
	// which corresponds to trace_sock_notify defined in bpf/lib/trace_sock.h
	MessageTypeTraceSock

	// 129-255 are reserved for agent level events

	// MessageTypeAccessLog contains a pkg/proxy/accesslog.LogRecord
	MessageTypeAccessLog = 129

	// MessageTypeAgent is an agent notification carrying a AgentNotify
	MessageTypeAgent = 130
)

const (
	MessageTypeNameDrop          = "drop"
	MessageTypeNameDebug         = "debug"
	MessageTypeNameCapture       = "capture"
	MessageTypeNameTrace         = "trace"
	MessageTypeNameL7            = "l7"
	MessageTypeNameAgent         = "agent"
	MessageTypeNamePolicyVerdict = "policy-verdict"
	MessageTypeNameRecCapture    = "recorder"
	MessageTypeNameTraceSock     = "trace-sock"
)

Therefore, in the above JSON output (last example), event type 4 is defined as trace, this particular event type also has a sub_typeas you can see here in the Hubble CLI, help output. You can see the definitions in the Go package here.

  -t, --type filter                         Filter by event types TYPE[:SUBTYPE]. Available types and subtypes:
                                            TYPE             SUBTYPE
                                            capture          n/a
                                            drop             n/a
                                            l7               n/a
                                            policy-verdict   n/a
                                            trace            from-endpoint
                                                             from-host
                                                             from-network
                                                             from-overlay
                                                             from-proxy
                                                             from-stack
                                                             to-endpoint
                                                             to-host
                                                             to-network
                                                             to-overlay
                                                             to-proxy
                                                             to-stack
                                            trace-sock       n/a

I hope this helps!

Regards

Dean Lewis