Category Archives: VMware

Kubernetes

vSphere CSI Driver Images unable from gcr.io – quick fix

The Issue

Someone has deleted the Cloud-Provider-vSphere project in the gcr.io registry for container images. The default pull policy for the vSphere CSI when using VMware’s manifests is set to always, meaning that if you reboot your cluster, it will not come back online.

vSphere-CSI Driver image unable - project deleted

This is what my cluster looked like when I booted it up today;

❯ kubectl get pods -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
vsphere-csi-controller-776fb75cd8-ptw4s 5/7 ErrImagePull 0 84m
vsphere-csi-controller-776fb75cd8-qt7kv 5/7 ImagePullBackOff 0 84m
vsphere-csi-controller-776fb75cd8-s7btf 5/7 ImagePullBackOff 0 84m
vsphere-csi-node-5qjjw 1/3 CrashLoopBackOff 80 (111s ago) 142d
vsphere-csi-node-fmdkz 2/3 ImagePullBackOff 84 (3m5s ago) 143d
vsphere-csi-node-gbt9w 1/3 CrashLoopBackOff 6 (26s ago) 5m56s
vsphere-csi-node-jkj98 1/3 CrashLoopBackOff 86 (24s ago) 143d
vsphere-csi-node-r69bl 1/3 CrashLoopBackOff 85 (102s ago) 143d
vsphere-csi-node-ww2zx 2/3 ImagePullBackOff 89 (3m5s ago) 143d

And when describing the pod;

❯ kubectl describe pod -n vmware-system-csi vsphere-csi-controller-776fb75cd8-ptw4s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 85m default-scheduler 0/6 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 4 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 84m default-scheduler 0/6 nodes are available: 6 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Warning FailedScheduling 6m54s default-scheduler 0/6 nodes are available: 6 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Normal Scheduled 6m27s default-scheduler Successfully assigned vmware-system-csi/vsphere-csi-controller-776fb75cd8-ptw4s to talos-2tp-6ld
Normal Created 6m26s kubelet Created container liveness-probe
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fsyncer%3Apull&service=gcr.io: 401 Unauthorized
Normal Started 6m26s kubelet Started container csi-attacher
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-resizer:v1.7.0" already present on machine
Normal Created 6m26s kubelet Created container csi-resizer
Normal Started 6m26s kubelet Started container csi-resizer
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
Warning Failed 6m26s kubelet Failed to pull image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to pull and unpack image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to resolve reference "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://gcr.io/v2/token?scope=repository%3Acloud-provider-vsphere%2Fcsi%2Frelease%2Fdriver%3Apull&service=gcr.io: 401 Unauthorized
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.9.0" already present on machine
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-attacher:v4.2.0" already present on machine
Normal Started 6m26s kubelet Started container liveness-probe
Normal Pulling 6m26s kubelet Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Normal Created 6m26s kubelet Created container csi-attacher
Warning Failed 6m26s kubelet Error: ErrImagePull
Normal Pulled 6m26s kubelet Container image "k8s.gcr.io/sig-storage/csi-provisioner:v3.4.0" already present on machine
Normal Created 6m26s kubelet Created container csi-provisioner
Normal Started 6m25s kubelet Started container csi-provisioner
Normal Pulled 6m25s kubelet Container image "k8s.gcr.io/sig-storage/csi-snapshotter:v6.2.1" already present on machine
Normal Created 6m25s kubelet Created container csi-snapshotter
Normal Started 6m25s kubelet Started container csi-snapshotter
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 6m24s kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.0"
Warning Failed 6m24s kubelet Error: ImagePullBackOff
Normal BackOff 83s (x21 over 6m24s) kubelet Back-off pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.0"
The Cause

Who knows? Maybe it cost Broadcom too much to host the images in Google Cloud. Or maybe they are moving to a model where you can only access the files when you pay for VCF.

The Workaround

Luckily the images are mirrored by Rancher, so I just updated the vSphere CSI manifest from:

– https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.3.1/manifests/vanilla/vsphere-csi-driver.yaml

And updated the image locations, you can get the updated file from my GitHub Gist belo. Continue reading vSphere CSI Driver Images unable from gcr.io – quick fix

VMware Change Block Tracking Issue - Header

vSphere data loss bug returns – CBT issues in vSphere ESXI 8.0 update 2

The Issue

I keep saying, there are no new ideas in technology, just re-hashes of old ones. That is also true for VMware and their data loss issues.

The vSphere-based change block tracking (CBT) bug is back! I think I wrote 5 articles on this back in 2014/2015 with explanations and fixes!

Veeam reported this at the start of week commencing 11th December 2023, with VMware confirming the issue by the end of the same week.

The Cause

Change block tracking is the feature used to see which blocks of data have changed since a known point in time, to enable backup software to capture only the incremental changes.

If this feature fails, you could lose data in your backups, as the backup software doesn’t know which blocks to protect.

as per VMware:

CBT's QueryChangedDiskAreas may lose some data changed on the disk after disk is hot-extended.
It only happens on ESXi 8.0u2.
The Fix/Workaround

Directly from VMware’s newly published KB, which took them only a few days to confirm this behaviour after Veeam noticed at the start of the week!

  • Resolution
    • Unfortunately, there is no fix available for this bug at this time. However, you can use the following workaround to work around the issue until a fix is released
  • Workaround
    1. Reset CBT after disk is hot-extended. Then, user need to take a full backup immediately.
      It does not fix existing backups, but it makes sure the new ones are good.
    2. Or, user extend disk in offline.

You cannot fix your existing incremental backups if they have been affected, if they missed the correct data to backup, it’s been missed! But you can run an Active Full backup to capture everything, certainly for Veeam this is the case, other backup vendors you’ll need to double check with!

How do I reset Change Block Tracking?

If you are using Veeam, you can just perform an Active Full backup, and ensure the reset CBT option is configured. This is enabled by default.

If you aren’t using Veeam, then the following will be your next steps.

To reset Change Block Tracking, as per this older VMware KB article from the last time this was an issue. VMware may update this article or produce another one now this recent bug has been found.

  • Find your VM in the vCenter Client
    • Power the VM off
    • Click the Options tab, select the Advanced section and then click Configuration Parameters.
  • Disable CBT for the virtual machine by setting the ctkEnabled value to false.
  • If you need to do this for specific virtual disks attached to your virtual machine
    • Disable CBT by configuring the scsix:x.ctkEnabled value for each attached virtual disk to false. (scsix:x is SCSI controller and SCSI device ID of your virtual disk.)
  • Ensure there are no snapshot files (.delta.vmdk) present in the virtual machine’s working directory. For more information, see Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect (1005049).
  • Delete any -CTK.VMDK files within the virtual machine’s working directory.

Now power on your virtual machine.

Depending on your backup software vendor, you may need to manually re-enable Change Block Tracking, you can find a full list of steps and considerations in this VMware KB article. It’s essentially power down the VM, enable in value again in configuration parameters.

Summary

Let’s hope VMware produces a fix for this quickly, I remember they had this issue in vSphere 5.5 and 6.0 and some fixes didn’t resolved the issue, it was a pain being a consultant having to install fixes at customers sites.

It’s good that VMware have only taken a short amount of time to validate this bug and publish something officially about it!

 

Regards

Dean Lewis

VMware Fusion Header

Script to uninstall and cleanup VMware Fusion

The VMware Fusion KB to remove the software makes reference to a number of areas you need to manually cleanup, so below a little script which closes the application, uninstalls the app and removes the files.

Note: Uses sudo to elevate permissions for running the command.

To run the script:

chmod +x vmware_fusion_uninstall_and_cleanup.sh

# Adding sudo to the start of this command will bypass the need to provide further passwords as the script runs
sudo ./vmware_fusion_uninstall_and_cleanup.sh

Script Summary from ChatGPT – Because why not!

  • The provided script is a convenient and efficient way to uninstall VMware Fusion, a virtualization software, on macOS. It also performs a cleanup to remove related files and directories.
  • The script is designed to be executed in the Terminal, and it ensures elevated privileges (sudo) where necessary to perform system-level tasks.
  • The script starts by force killing VMware Fusion if it is running, ensuring a smooth uninstallation process.
  • Next, it moves the VMware Fusion application bundle from the /Applications folder to the Trash, effectively uninstalling the software.
  • The script then proceeds with the removal of various files and directories associated with VMware Fusion, cleaning up the system and freeing disk space.
  • The targeted files and directories include configuration files, caches, and preferences related to VMware Fusion.
  • Using a script for cleanup ensures that no traces of VMware Fusion are left behind, avoiding potential conflicts with other software or future installations.
  • However, users are advised to exercise caution when running scripts with sudo privileges, as it grants significant control over the system and can cause unintended consequences if used incorrectly.
  • A backup of important data is recommended before proceeding with the uninstallation and cleanup.
  • This script is suitable for users who want a streamlined and automated way to uninstall VMware Fusion and remove associated files on macOS.

Regards

Dean Lewis

vROPs Header

Collect VM Notes in (Aria) vRealize Operations: A Step-by-Step Guide

One of the most common questions I’ve come across in previous years is how do I get the VM notes held in vCenter into vRealize (Aria) Operations?

Great news, in vRealize Operations 8.10 and later, you can now collect those properties for the virtual machines simply by enabling the property to be collected in your Policy.

Enable the Notes property on your Policy
  • Click on Policies under Configure in the left-hand navigation pane
  • Select your active policy that you want to alter
    • You may need to change multiple policies due to inheritance settings

vROPs - VM Notes - Edit Policy

  • Select the Edit Policy in the far right-hand side

vROPs - VM Notes - Edit Policy 2

  • Set the object type as “Virtual Machine”
  • Search “note” to curate the list to show just the property we are interested in
  • Expand Properties > System
  • Highlight Notes and click on “Deactivated” and change to “Activated”
  • Click Save

vROPs - VM Notes - Edit Policy - Metrics and Properties - Virtual Machine - Enable System Notes

vROPs - VM Notes - Edit Policy - Metrics and Properties - Virtual Machine - System Notes - Activated

Viewing the VM notes and adding them to a view and reports

Now it’s a case of wait for the collection cycle of your vSphere environment, below you can see an example of a virtual machine which is configured with a note. Any note changes will also be captured.

vROPs - VM Notes - Virtual Machine Property - System - Notes

Now let’s look at adding this property to an existing report as well.

In the below, I’m going to edit the view “Virtual Machine Inventory” which is used to power the out-of-the-box report “Inventory Report – Virtual Machines”

  • Under Visualize on the left-hand navigation, click on Views
  • Click Manage Views, find your view and click to edit
  • Go to Step 2 – Data
  • The Selected Subject will already be Virtual Machine (red box)
  • Search for Note (1)
  • Drag the note property to the data column (2)
  • Set a vanity name for the property (3)
  • Set a preview source (green box)
    • Ensure that the VM note displays as expected (4)
  • Click Update

vROPs - VM Notes - Edit View

Now let’s check this updated view is reflected in our report:

  • Under Visualize on the left-hand navigation, click on Reports
  • Click to edit your chosen report
  • Expand the Views and Dashboards section in the report
  • In the red boxes you can see the matching name of the view I edited in the above screenshots, and the VM Notes Column is present

vROPs - VM Notes - Edit Report

Finally, when I run this report, I can see the additional VM note data added to the report.

vROPs - VM Notes - Run Report

Hopefully this new simple but much asked for feature will help in the ongoing management of your environments.

Regards

Dean Lewis

vRealize Operations Header

How to Add vSphere Tags to vRealize Operations Alert Emails using a Custom Payload

Wondering how to add the vSphere Tag for a virtual machine to emails sent out for alerts? I recently came across this Reddit post, so decided to try out the Custom Payload feature from vRealize (Aria) Operations and want to share the steps I took to achieve this setting.

Here‘s how to configure a Payload Template and Notification to include the vSphere Tag:

Creating the custom payload template to include the vSphere Tag

To get started, within your vRealize Operations interface (SaaS or on-premises), go to:

  • Configure > Alerts
  • Click on Payload Templates icon
  • Click Add to create a new template

vROPS - Custom Payload - Alerts - Payload Templates

  • Give your custom payload template a name,
  • a description,
  • and set which outbound method it’s tied to. For my example, it will be email.
  • Click Next

Continue reading How to Add vSphere Tags to vRealize Operations Alert Emails using a Custom Payload