Tag Archives: data loss

vSphere data loss bug returns – CBT issues in vSphere ESXI 8.0 update 2

December 15, 2023General, Veeam, VMware8.0, Bug, CBT, Change Block Tracking, data loss, ESXi, Reset CBT, update 2, Veeam, VMware, vSphereDean

The Issue

I keep saying, there are no new ideas in technology, just re-hashes of old ones. That is also true for VMware and their data loss issues.

The vSphere-based change block tracking (CBT) bug is back! I think I wrote 5 articles on this back in 2014/2015 with explanations and fixes!

Veeam reported this at the start of week commencing 11th December 2023, with VMware confirming the issue by the end of the same week.

The Cause

Change block tracking is the feature used to see which blocks of data have changed since a known point in time, to enable backup software to capture only the incremental changes.

If this feature fails, you could lose data in your backups, as the backup software doesn’t know which blocks to protect.

as per VMware:

CBT's QueryChangedDiskAreas may lose some data changed on the disk after disk is hot-extended.
It only happens on ESXi 8.0u2.

The Fix/Workaround

Directly from VMware’s newly published KB, which took them only a few days to confirm this behaviour after Veeam noticed at the start of the week!

Resolution

- Unfortunately, there is no fix available for this bug at this time. However, you can use the following workaround to work around the issue until a fix is released

Workaround

1. Reset CBT after disk is hot-extended. Then, user need to take a full backup immediately.
  It does not fix existing backups, but it makes sure the new ones are good.
2. Or, user extend disk in offline.

You cannot fix your existing incremental backups if they have been affected, if they missed the correct data to backup, it’s been missed! But you can run an Active Full backup to capture everything, certainly for Veeam this is the case, other backup vendors you’ll need to double check with!

How do I reset Change Block Tracking?

If you are using Veeam, you can just perform an Active Full backup, and ensure the reset CBT option is configured. This is enabled by default.

If you aren’t using Veeam, then the following will be your next steps.

To reset Change Block Tracking, as per this older VMware KB article from the last time this was an issue. VMware may update this article or produce another one now this recent bug has been found.

Find your VM in the vCenter Client
- Power the VM off
- Click the Options tab, select the Advanced section and then click Configuration Parameters.
Disable CBT for the virtual machine by setting the ctkEnabled value to false.
If you need to do this for specific virtual disks attached to your virtual machine
- Disable CBT by configuring the scsix:x.ctkEnabled value for each attached virtual disk to false. (scsix:x is SCSI controller and SCSI device ID of your virtual disk.)
Ensure there are no snapshot files (.delta.vmdk) present in the virtual machine’s working directory. For more information, see Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect (1005049).
Delete any -CTK.VMDK files within the virtual machine’s working directory.

Now power on your virtual machine.

Depending on your backup software vendor, you may need to manually re-enable Change Block Tracking, you can find a full list of steps and considerations in this VMware KB article. It’s essentially power down the VM, enable in value again in configuration parameters.

Summary

Let’s hope VMware produces a fix for this quickly, I remember they had this issue in vSphere 5.5 and 6.0 and some fixes didn’t resolved the issue, it was a pain being a consultant having to install fixes at customers sites.

It’s good that VMware have only taken a short amount of time to validate this bug and publish something officially about it!

Regards

Follow @Saintdle

Dean Lewis

MongoDB Container data loss issue – A Journey

August 30, 2021KubernetesBitnami, data loss, Kubernetes, MongoDB, Persistent, Volume, WriteDean

Over the past month or so I noticed an issue with my Pac-Man Kubernetes application, which I use for demonstrations as a basic app front-end that writes to a database back end, running in Kubernetes.

When I restored my instances using Kasten, my Pac-Man high scores were missing.
This issue happened when I made some changes to my deployment files to configure authentication to the MongoDB using environment variables in my deployment file.

This blog post is a detail walk-through of the steps I took to troubleshoot the issue, and then rectify it!

Summary if you don’t want to read the post

If you are not looking to read through this blog post, here is the summary:

I changed MongoDB images, I needed to configure a new mount point location to match the MongoDB configuration
New MongoDB image is non-root, so had to use an Init container to configure the permissions on the PV first

Overview of the application

The application is made up of the following components:

Namespace
Deployment
- MongoDB Pod
  - DB Authentication configured
  - Attached to a PVC
- Pac-Man Pod
  - Nodejs web front end that connects back to the MongoDB Pod by looking for the Pod DNS address internally.
RBAC Configuration for Pod Security and Service Account
Secret which holds the data for the MongoDB Usernames and Passwords to be configured
Service
- Type: LoadBalancer
  - Used to balance traffic to the Pac-Man Pods

Confirming the behaviour

The behaviour I was seeing when my application was deployed:

Pac-Man web page – I could save a high score, and it would show in the high scores list
- This showed the connectivity to the database was working, as the app would hang if it could not write to the database.
I would protect my application using Kasten. When I deleted the namespace, and restored everything, my application would be running, but there was no high scores to show.
This was apparent from deploying the branch version v0.5.0 and v0.5.1 from my GitHub.
Deploying the branch v0.2.0 would not product the same behaviour
- This configuration did not have any database authentication setup, meaning MongoDB was open to the world if they could connect without a UN/Password.

Testing the Behaviour

Continue reading MongoDB Container data loss issue – A Journey →

Postman – Logging in results in losing my offline work

May 10, 2021Generaldata loss, Postman, recover, scratch padDean

The Issue

When working with Postman in an offline mode or not signed in, then choosing to sign in, you lose access to your Collections and Environments you have worked on previously.

The Cause

In later versions, Postman introduce the Scratchpad. This is an offline area where your data is saved to.

When you create a new account in the app, you should be presented an option to move your data from your scratchpad.

If you already have an account to log into, you do not seem to get this option.

The Fix

Within Postman application > Click the Settings Cog > Select “Scratch Pad”

So now you should be able to see your offline data. If you can, you need to manually export your data then change back to your workspace and import the data.

If you are still unable to find your data. I recommend you follow this article from the Postman support site on “how to recover my data”. I did not personally have much success with this method.

Regards

Follow @Saintdle

Dean Lewis

vEducate.co.uk

Fixing issues and blogging

Tag Archives: data loss

vSphere data loss bug returns – CBT issues in vSphere ESXI 8.0 update 2

The Issue

The Cause

The Fix/Workaround

How do I reset Change Block Tracking?

Summary

MongoDB Container data loss issue – A Journey

Summary if you don’t want to read the post

Overview of the application

Confirming the behaviour

Testing the Behaviour

Postman – Logging in results in losing my offline work

The Issue

The Cause

The Fix