VMware Change Block Tracking Issue - Header

vSphere data loss bug returns – CBT issues in vSphere ESXI 8.0 update 2

The Issue

I keep saying, there are no new ideas in technology, just re-hashes of old ones. That is also true for VMware and their data loss issues.

The vSphere-based change block tracking (CBT) bug is back! I think I wrote 5 articles on this back in 2014/2015 with explanations and fixes!

Veeam reported this at the start of week commencing 11th December 2023, with VMware confirming the issue by the end of the same week.

The Cause

Change block tracking is the feature used to see which blocks of data have changed since a known point in time, to enable backup software to capture only the incremental changes.

If this feature fails, you could lose data in your backups, as the backup software doesn’t know which blocks to protect.

as per VMware:

CBT's QueryChangedDiskAreas may lose some data changed on the disk after disk is hot-extended.
It only happens on ESXi 8.0u2.
The Fix/Workaround

Directly from VMware’s newly published KB, which took them only a few days to confirm this behaviour after Veeam noticed at the start of the week!

  • Resolution
    • Unfortunately, there is no fix available for this bug at this time. However, you can use the following workaround to work around the issue until a fix is released
  • Workaround
    1. Reset CBT after disk is hot-extended. Then, user need to take a full backup immediately.
      It does not fix existing backups, but it makes sure the new ones are good.
    2. Or, user extend disk in offline.

You cannot fix your existing incremental backups if they have been affected, if they missed the correct data to backup, it’s been missed! But you can run an Active Full backup to capture everything, certainly for Veeam this is the case, other backup vendors you’ll need to double check with!

How do I reset Change Block Tracking?

If you are using Veeam, you can just perform an Active Full backup, and ensure the reset CBT option is configured. This is enabled by default.

If you aren’t using Veeam, then the following will be your next steps.

To reset Change Block Tracking, as per this older VMware KB article from the last time this was an issue. VMware may update this article or produce another one now this recent bug has been found.

  • Find your VM in the vCenter Client
    • Power the VM off
    • Click the Options tab, select the Advanced section and then click Configuration Parameters.
  • Disable CBT for the virtual machine by setting the ctkEnabled value to false.
  • If you need to do this for specific virtual disks attached to your virtual machine
    • Disable CBT by configuring the scsix:x.ctkEnabled value for each attached virtual disk to false. (scsix:x is SCSI controller and SCSI device ID of your virtual disk.)
  • Ensure there are no snapshot files (.delta.vmdk) present in the virtual machine’s working directory. For more information, see Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect (1005049).
  • Delete any -CTK.VMDK files within the virtual machine’s working directory.

Now power on your virtual machine.

Depending on your backup software vendor, you may need to manually re-enable Change Block Tracking, you can find a full list of steps and considerations in this VMware KB article. It’s essentially power down the VM, enable in value again in configuration parameters.

Summary

Let’s hope VMware produces a fix for this quickly, I remember they had this issue in vSphere 5.5 and 6.0 and some fixes didn’t resolved the issue, it was a pain being a consultant having to install fixes at customers sites.

It’s good that VMware have only taken a short amount of time to validate this bug and publish something officially about it!

 

Regards

Dean Lewis

Grafana Header

Grafana – unable to login “User already exists”

The Issue

When trying to log into Grafana Web UI using an OIDC provider, in my case, Dex. The login would fail due to the error “User already exists”, after some time. This happened for any users given access via the OIDC.

The Cause

This looks to happen due to a CVE fix implemented in Grafana as documented in the two comments below:

The Fix

To resolve this issue, for Grafana 10.0.x and 9.5.6, the env variable GF_AUTH_OAUTH_ALLOW_INSECURE_EMAIL_LOOKUP can be set or the config key oauth_allow_insecure_email_lookup can be set under the auth section.

[auth]
oauth_allow_insecure_email_lookup=true

Source + Source 2

Hope this helps anyone stuck out there!

Regards

Dean Lewis

Wordpress and google analytics

Google Analytics GA4 – Fix Thresholding Applied and get granular refferal source data

The Issue

Moving over to the new Google Analytics GA4 from UA, has caused me a few issues, mainly I wasn’t able to get granular source data from my referrals, which websites were users hitting my site from.

Below is an example from my UA screen, I could see the domains that users were hitting my site from. google analytics UA - referral example

Using the traffic acqusition report in GA4, I couldn’t see the same information when selecting session source.

google analytics GA4 - referral example

The Cause

In the above screenshot, next to the report name you can see a little warning symbol. This is to tell me that thresholding has been applied, which stops me from identifying individual users.

This setting is caused by the google signals feature that was turned on as part of the migration to GA4, which is aimed at helping to identify more data about my visitors. But it’s something I don’t care about.

The Fix Continue reading Google Analytics GA4 – Fix Thresholding Applied and get granular refferal source data

VMware Fusion Header

Script to uninstall and cleanup VMware Fusion

The VMware Fusion KB to remove the software makes reference to a number of areas you need to manually cleanup, so below a little script which closes the application, uninstalls the app and removes the files.

Note: Uses sudo to elevate permissions for running the command.

To run the script:

chmod +x vmware_fusion_uninstall_and_cleanup.sh

# Adding sudo to the start of this command will bypass the need to provide further passwords as the script runs
sudo ./vmware_fusion_uninstall_and_cleanup.sh

Script Summary from ChatGPT – Because why not!

  • The provided script is a convenient and efficient way to uninstall VMware Fusion, a virtualization software, on macOS. It also performs a cleanup to remove related files and directories.
  • The script is designed to be executed in the Terminal, and it ensures elevated privileges (sudo) where necessary to perform system-level tasks.
  • The script starts by force killing VMware Fusion if it is running, ensuring a smooth uninstallation process.
  • Next, it moves the VMware Fusion application bundle from the /Applications folder to the Trash, effectively uninstalling the software.
  • The script then proceeds with the removal of various files and directories associated with VMware Fusion, cleaning up the system and freeing disk space.
  • The targeted files and directories include configuration files, caches, and preferences related to VMware Fusion.
  • Using a script for cleanup ensures that no traces of VMware Fusion are left behind, avoiding potential conflicts with other software or future installations.
  • However, users are advised to exercise caution when running scripts with sudo privileges, as it grants significant control over the system and can cause unintended consequences if used incorrectly.
  • A backup of important data is recommended before proceeding with the uninstallation and cleanup.
  • This script is suitable for users who want a streamlined and automated way to uninstall VMware Fusion and remove associated files on macOS.

Regards

Dean Lewis

Kubernetes

Kubernetes Metric Server – cannot validate certificate because it doesn’t contain any IP SANs

The Issue

Whilst trying to install the Metric’s server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

so I could use kubectl top node for it’s metrics on Node resource useage, I found the pods were not loading, and upon inspection found the following:

> kubectl logs -n kube-system metrics-server-6f6cdbf67d-v6sbf 

I0717 12:19:32.132722 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0717 12:19:39.159422 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.49.2:10250/metrics/resource\": x509: cannot validate certificate for 192.168.49.2 because it doesn't contain any IP SANs" node="minikube"

The Cause

The issue here was due to the installation of Cert-Manager and setting up some TLS configurations within the CNI and Self-Signed certificates, the metric’s server wasn’t able to validate the authority of the Kubernetes API

The Fix

As this is communication within the cluster, I could simply fix this by telling Metric Server container to trust the insecure certificates from the API using the below
kubectl patch command:

kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

 

Regards

Dean Lewis