Category Archives: Kubernetes

Kubernetes

vSphere CSI Driver Images unable from gcr.io – quick fix

The Issue

Someone has deleted the Cloud-Provider-vSphere project in the gcr.io registry for container images. The default pull policy for the vSphere CSI when using VMware’s manifests is set to always, meaning that if you reboot your cluster, it will not come back online.

vSphere-CSI Driver image unable - project deleted

This is what my cluster looked like when I booted it up today;

❯ kubectl get pods -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
vsphere-csi-controller-776fb75cd8-ptw4s 5/7 ErrImagePull 0 84m
vsphere-csi-controller-776fb75cd8-qt7kv 5/7 ImagePullBackOff 0 84m
vsphere-csi-controller-776fb75cd8-s7btf 5/7 ImagePullBackOff 0 84m
vsphere-csi-node-5qjjw 1/3 CrashLoopBackOff 80 (111s ago) 142d
vsphere-csi-node-fmdkz 2/3 ImagePullBackOff 84 (3m5s ago) 143d
vsphere-csi-node-gbt9w 1/3 CrashLoopBackOff 6 (26s ago) 5m56s
vsphere-csi-node-jkj98 1/3 CrashLoopBackOff 86 (24s ago) 143d
vsphere-csi-node-r69bl 1/3 CrashLoopBackOff 85 (102s ago) 143d
vsphere-csi-node-ww2zx 2/3 ImagePullBackOff 89 (3m5s ago) 143d

And when describing the pod;

❯ kubectl describe pod -n vmware-system-csi vsphere-csi-controller-776fb75cd8-ptw4s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 85m default-scheduler 0/6 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 4 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 84m default-scheduler 0/6 nodes are available: 6 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 6m54s default-scheduler 0/6 nodes are available: 6 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Normal Scheduled 6m27s default-scheduler Successfully assigned vmware-system-csi/vsphere-csi-controller-776fb75cd8-ptw4s to talos-2tp-6ld
Normal Created 6m26s kubelet Created container liveness-probe
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fsyncer%3Apull&service=gcr.io: 401 Unauthorized
Normal Started 6m26s kubelet Started container csi-attacher
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-resizer:v1.7.0" already present on machine
Normal Created 6m26s kubelet Created container csi-resizer
Normal Started 6m26s kubelet Started container csi-resizer
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fdriver%3Apull&service=gcr.io: 401 Unauthorized
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.9.0" already present on machine
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-attacher:v4.2.0" already present on machine
Normal Started 6m26s kubelet Started container liveness-probe
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Normal Created 6m26s kubelet Created container csi-attacher
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-provisioner:v3.4.0" already present on machine
Normal Created 6m26s kubelet Created container csi-provisioner
Normal Started 6m25s kubelet Started container csi-provisioner
Normal Pulled 6m25s kubelet Container image "k8s.gcr.io/sig-storage/csi-snapshotter:v6.2.1" already present on machine
Normal Created 6m25s kubelet Created container csi-snapshotter
Normal Started 6m25s kubelet Started container csi-snapshotter
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 6m24s kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 83s (x21 over 6m24s) kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
The Cause

Who knows? Maybe it cost Broadcom too much to host the images in Google Cloud. Or maybe they are moving to a model where you can only access the files when you pay for VCF.

The Workaround

Luckily the images are mirrored by Rancher, so I just updated the vSphere CSI manifest from:

– https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml

And updated the image locations, you can get the updated file from my GitHub Gist belo. Continue reading vSphere CSI Driver Images unable from gcr.io – quick fix

Cilium Event Types Header

Understanding cilium_event_type when using Cilium & Hubble

The Issue

In a platform that’s deployed with Cilium, when using Hubble either to view the full JSON output or to configure which events are captured using the allowlist or denylist you may have seen a field called event_type which uses an integer.

Below is an example allow list using “event_type”, to define which flows to be captured. When I first saw this, I was confused; where do these numbers come from? How do I map this back to a friendly name that I understand?;

allowlist:
- '{"source_pod":["kube-system/"],"event_type":[{"type":1}]}'
- '{"destination_pod":["kube-system/"],"event_type":[{"type":1}]}'

Example Hubble Dynamic Exporter configuration;

hubble:
  export:
    dynamic:
      enabled: true
      config:
        enabled: true
        content:
        - name: "test001"
          filePath: "/var/run/cilium/hubble/test001.log"
          fieldMask: []
          includeFilters: []
          excludeFilters: []
          end: "2023-10-09T23:59:59-07:00"
        - name: "test002"
          filePath: "/var/run/cilium/hubble/test002.log"
          fieldMask: ["source.namespace", "source.pod_name", "destination.namespace", "destination.pod_name", "verdict"]
          includeFilters:
          - source_pod: ["default/"]
            event_type:
            - type: 1
          - destination_pod: ["frontend/webserver-975996d4c-7hhgt"]

and finally, a Hubble flow in full JSON output, with the event_type showing towards the end of the output;

{
  "flow": {
    "time": "2024-07-08T10:09:24.173232166Z",
    "uuid": "755b0203-d456-452d-b399-4fa136cdb4fd",
    "verdict": "FORWARDED",
    "ethernet": {
      "source": "06:29:73:4e:0a:c5",
      "destination": "26:50:d8:4a:94:d2"
    },
    "IP": {
      "source": "10.0.2.163",
      "destination": "130.211.198.204",
      "ipVersion": "IPv4"
    },
    "l4": {
      "TCP": {
        "source_port": 37736,
        "destination_port": 443,
        "flags": {
          "PSH": true,
          "ACK": true
        }
      }
    },
    "source": {
      "ID": 2045,
      "identity": 14398,
      "namespace": "endor",
      "labels": [
        "k8s:app.kubernetes.io/name=tiefighter"
      ],
      "pod_name": "tiefighter-6b56bdc869-2t6wn",
      "workloads": [
        {
          "name": "tiefighter",
          "kind": "Deployment"
        }
      ]
    },
    "destination": {
      "identity": 16777217,
      "labels": [
        "cidr:130.211.198.204/32",
        "reserved:world"
      ]
    },
    "Type": "L3_L4",
    "node_name": "kind-worker",
    "destination_names": [
      "disney.com"
    ],
    "event_type": {
      "type": 4,
      "sub_type": 3
    },
    "traffic_direction": "EGRESS",
    "trace_observation_point": "TO_STACK",
    "is_reply": false,
    "Summary": "TCP Flags: ACK, PSH"
  },
  "node_name": "kind-worker",
  "time": "2024-07-08T10:09:24.173232166Z"
}
The Explanation

Cilium Event types are defined in this Go package. The first line iota == 0 then increments by one for each type, so drop =1, debug =2, etc.

const (
	// 0-128 are reserved for BPF datapath events
	MessageTypeUnspec = iota

	// MessageTypeDrop is a BPF datapath notification carrying a DropNotify
	// which corresponds to drop_notify defined in bpf/lib/drop.h
	MessageTypeDrop

	// MessageTypeDebug is a BPF datapath notification carrying a DebugMsg
	// which corresponds to debug_msg defined in bpf/lib/dbg.h
	MessageTypeDebug

	// MessageTypeCapture is a BPF datapath notification carrying a DebugCapture
	// which corresponds to debug_capture_msg defined in bpf/lib/dbg.h
	MessageTypeCapture

	// MessageTypeTrace is a BPF datapath notification carrying a TraceNotify
	// which corresponds to trace_notify defined in bpf/lib/trace.h
	MessageTypeTrace

	// MessageTypePolicyVerdict is a BPF datapath notification carrying a PolicyVerdictNotify
	// which corresponds to policy_verdict_notify defined in bpf/lib/policy_log.h
	MessageTypePolicyVerdict

	// MessageTypeRecCapture is a BPF datapath notification carrying a RecorderCapture
	// which corresponds to capture_msg defined in bpf/lib/pcap.h
	MessageTypeRecCapture

	// MessageTypeTraceSock is a BPF datapath notification carrying a TraceNotifySock
	// which corresponds to trace_sock_notify defined in bpf/lib/trace_sock.h
	MessageTypeTraceSock

	// 129-255 are reserved for agent level events

	// MessageTypeAccessLog contains a pkg/proxy/accesslog.LogRecord
	MessageTypeAccessLog = 129

	// MessageTypeAgent is an agent notification carrying a AgentNotify
	MessageTypeAgent = 130
)

const (
	MessageTypeNameDrop          = "drop"
	MessageTypeNameDebug         = "debug"
	MessageTypeNameCapture       = "capture"
	MessageTypeNameTrace         = "trace"
	MessageTypeNameL7            = "l7"
	MessageTypeNameAgent         = "agent"
	MessageTypeNamePolicyVerdict = "policy-verdict"
	MessageTypeNameRecCapture    = "recorder"
	MessageTypeNameTraceSock     = "trace-sock"
)

Therefore, in the above JSON output (last example), event type 4 is defined as trace, this particular event type also has a sub_typeas you can see here in the Hubble CLI, help output. You can see the definitions in the Go package here.

  -t, --type filter                         Filter by event types TYPE[:SUBTYPE]. Available types and subtypes:
                                            TYPE             SUBTYPE
                                            capture          n/a
                                            drop             n/a
                                            l7               n/a
                                            policy-verdict   n/a
                                            trace            from-endpoint
                                                             from-host
                                                             from-network
                                                             from-overlay
                                                             from-proxy
                                                             from-stack
                                                             to-endpoint
                                                             to-host
                                                             to-network
                                                             to-overlay
                                                             to-proxy
                                                             to-stack
                                            trace-sock       n/a

I hope this helps!

Regards

Dean Lewis

Kubernetes

KubeVirt – Creating Data Volume fails – DataVolume.storage spec is missing accessMode and volumeMode

The Issue

I had a freshly deployed Red Hat OpenShift cluster on which I had set up OpenShift Virtualization (KubeVirt). When I tried to create my first data volume so I could start creating VMs, nothing happened. The ISO file upload would fail.

Running oc describe dv {name} -n {namespace} showed the below event/error.

DataVolume.storage spec is missing accessMode and volumeMode, cannot get access mode from StorageProfile managed-nfs-storage
The Cause

This is caused by default created StorageProfile used by KubeVirt, which will be created from your default StorageClass, not setting the correct values, as per this git issue comment.

Previously virtctl (CLI for KubeVirt) used ReadWriteOnce if accessMode was not specified. But now, as we want to use possibilities provided by StorageProfiles, it does not set accessMode.
The Fix

Run the following command to update your StorageProfile with the correct settings, pay attention to whether you needReadWriteOnce or ReadWriteMany.

kubectl patch StorageProfile {StorageProfileName} \
  --type merge -p \
  '{"spec": {"claimPropertySets": [{"accessModes": ["ReadWriteOnce"]}]}}' 

Regards

Dean Lewis

Cilium Hubble CLI - Header Image

Cilium Hubble CLI – Configure Auto Completion

One of the little nits I have is when I use the terminal, and as I start typing a command if I press the Tab key, autocomplete doesn’t work. It feels like it should be the default out of the box.

You can configure the Hubble CLI for Cilium, but it’s not documented in the docs.cilium.io pages yet, so I thought I’d throw up a quick post adding it here!

This command pushes the auto-complete config into my zsh config on macOS.

hubble completion zsh > $(brew --prefix)/share/zsh/site-functions/_hubble

For other platforms, you can see the examples provided for Cilium Agent, and apply the logic to your own environment.

Cilium Hubble CLI - Autocompletion

Regards

Dean Lewis

Cilium Hubble CLI - Header Image

Cilium Hubble CLI – Using a local configuration file

Did you know that the Cilium Hubble CLI supports using a configuration file?

Below is an example command where Isovalent Enterprise for Cilium is deployed and Hubble RBAC is configured. Therefore, I must provide additional details such as the server location and certificates to authenticate using the CLI. The steps in this blog post also work with Cilium OSS, which is especially handy when setting allow and deny lists to prune the information returned.

This can become cumbersome for every command you want to run.

❯ hubble observe \
--server tls://localhost:4245 \
--tls-ca-cert-files ca-cert.pem \
--tls-server-name 'cli.hubble-relay.cilium.io' \
--namespace kube-system
Mar 20 13:09:38.061: tenant-jobs/resumes-58c6678bc8-5nkcg:36459 (ID:88195) -> kube-system/coredns-77fcb74c4c-wfw4f:53 (ID:102385) policy-verdict:L3-L4 EGRESS ALLOWED (UDP)
Mar 20 13:09:38.061: tenant-jobs/resumes-58c6678bc8-5nkcg:36459 (ID:88195) -> kube-system/coredns-77fcb74c4c-wfw4f:53 (ID:102385) to-proxy FORWARDED (UDP)
Mar 20 13:09:38.062: tenant-jobs/resumes-58c6678bc8-5nkcg (ID:88195) <> kube-system/coredns-77fcb74c4c-wfw4f:53 (ID:102385) post-xlate-fwd TRANSLATED (UDP)
Mar 20 13:09:38.062: tenant-jobs/resumes-58c6678bc8-5nkcg:36459 (ID:88195) -> kube-system/coredns-77fcb74c4c-wfw4f:53 (ID:102385) dns-request proxy FORWARDED (DNS Query coreapi.tenant-jobs.svc.cluster.local. A)

Below you can see the various configuration options that the Hubble CLI supports. The above example is using flags as part of the command.

hubble config -h
Config allows to modify or view the hubble configuration. Global hubble options
can be set via flags, environment variables or a configuration file. The
following precedence order is used:

1. Flag
2. Environment variable
3. Configuration file
4. Default value

The "config view" subcommand provides a merged view of the configuration. The
"config set" and "config reset" subcommand modify values in the configuration
file.

Environment variable names start with HUBBLE_ followed by the flag name
capitalized where eventual dashes ('-') are replaced by underscores ('_').
For example, the environment variable that corresponds to the "--server" flag
is HUBBLE_SERVER. The environment variable for "--tls-allow-insecure" is
HUBBLE_TLS_ALLOW_INSECURE and so on.

Usage:
  hubble config [flags]
  hubble config [command]

Available Commands:
  get         Get an individual value in the hubble config file
  reset       Reset all or an individual value in the hubble config file
  set         Set an individual value in the hubble config file
  view        Display merged configuration settings

Using the below commands, I can set the flags as values in the configuration file, for any CLI flag, the set value will be prepended with HUBBLE_+ the flag name.

❯ hubble config set HUBBLE_SERVER tls://localhost:4245
unknown key: HUBBLE_SERVER
❯ hubble config set server tls://localhost:4245
❯ hubble config set tls-ca-cert-files ca-cert.pem

❯ hubble config set tls-server-name 'cli.hubble-relay.cilium.io'

Now we can use the Hubble CLI without the additional flags.

❯ hubble observe -n tenant-jobs
Mar 20 13:13:30.004: tenant-jobs/coreapi-6748664db6-rmr2j:42935 (ID:111121) <- kube-system/coredns-77fcb74c4c-wfw4f:53 (ID:102385) to-endpoint FORWARDED (UDP)
Mar 20 13:13:30.496: tenant-jobs/crawler-6dbf4f8b5d-vr7gr:47804 (ID:71705) -> tenant-jobs/loader-68544b8b87-zrxwt:50051 (ID:115137) http-request FORWARDED (HTTP/2 POST http://loader:50051/loader.Loader/LoadCv)
Mar 20 13:13:30.505: tenant-jobs/crawler-6dbf4f8b5d-vr7gr:47804 (ID:71705) <- tenant-jobs/loader-68544b8b87-zrxwt:50051 (ID:115137) http-response FORWARDED (HTTP/2 200 9ms (POST http://loader:50051/loader.Loader/LoadCv))

We can validate the configuration in use by running the below command, which also confirms the location of the config file itself, which you can edit directly.

❯ hubble config view
allowlist: []
client-id: ""
client-secret: ""
config: /Users/veducate/Library/Application Support/hubble/config.yaml
debug: false
denylist: []
grant-type: auto
issuer: ""
issuer-ca: ""
refresh: false
scopes: []
server: tls://localhost:4245
timeout: 5s
tls: false
tls-allow-insecure: false
tls-ca-cert-files:
- ca-cert.pem
tls-client-cert-file: ""
tls-client-key-file: ""
tls-server-name: cli.hubble-relay.cilium.io
token-file

Regards

Dean Lewis