Tag Archives: 8.0

VMware Change Block Tracking Issue - Header

vSphere data loss bug returns – CBT issues in vSphere ESXI 8.0 update 2

The Issue

I keep saying, there are no new ideas in technology, just re-hashes of old ones. That is also true for VMware and their data loss issues.

The vSphere-based change block tracking (CBT) bug is back! I think I wrote 5 articles on this back in 2014/2015 with explanations and fixes!

Veeam reported this at the start of week commencing 11th December 2023, with VMware confirming the issue by the end of the same week.

The Cause

Change block tracking is the feature used to see which blocks of data have changed since a known point in time, to enable backup software to capture only the incremental changes.

If this feature fails, you could lose data in your backups, as the backup software doesn’t know which blocks to protect.

as per VMware:

CBT's QueryChangedDiskAreas may lose some data changed on the disk after disk is hot-extended.
It only happens on ESXi 8.0u2.
The Fix/Workaround

Directly from VMware’s newly published KB, which took them only a few days to confirm this behaviour after Veeam noticed at the start of the week!

  • Resolution
    • Unfortunately, there is no fix available for this bug at this time. However, you can use the following workaround to work around the issue until a fix is released
  • Workaround
    1. Reset CBT after disk is hot-extended. Then, user need to take a full backup immediately.
      It does not fix existing backups, but it makes sure the new ones are good.
    2. Or, user extend disk in offline.

You cannot fix your existing incremental backups if they have been affected, if they missed the correct data to backup, it’s been missed! But you can run an Active Full backup to capture everything, certainly for Veeam this is the case, other backup vendors you’ll need to double check with!

How do I reset Change Block Tracking?

If you are using Veeam, you can just perform an Active Full backup, and ensure the reset CBT option is configured. This is enabled by default.

If you aren’t using Veeam, then the following will be your next steps.

To reset Change Block Tracking, as per this older VMware KB article from the last time this was an issue. VMware may update this article or produce another one now this recent bug has been found.

  • Find your VM in the vCenter Client
    • Power the VM off
    • Click the Options tab, select the Advanced section and then click Configuration Parameters.
  • Disable CBT for the virtual machine by setting the ctkEnabled value to false.
  • If you need to do this for specific virtual disks attached to your virtual machine
    • Disable CBT by configuring the scsix:x.ctkEnabled value for each attached virtual disk to false. (scsix:x is SCSI controller and SCSI device ID of your virtual disk.)
  • Ensure there are no snapshot files (.delta.vmdk) present in the virtual machine’s working directory. For more information, see Determining if there are leftover delta files or snapshots that VMware vSphere or Infrastructure Client cannot detect (1005049).
  • Delete any -CTK.VMDK files within the virtual machine’s working directory.

Now power on your virtual machine.

Depending on your backup software vendor, you may need to manually re-enable Change Block Tracking, you can find a full list of steps and considerations in this VMware KB article. It’s essentially power down the VM, enable in value again in configuration parameters.

Summary

Let’s hope VMware produces a fix for this quickly, I remember they had this issue in vSphere 5.5 and 6.0 and some fixes didn’t resolved the issue, it was a pain being a consultant having to install fixes at customers sites.

It’s good that VMware have only taken a short amount of time to validate this bug and publish something officially about it!

 

Regards

Dean Lewis

Veeam vRA Header

How to backup vRealize Automation 8.x using Veeam

In this blog post I am going to dissect backing up vRealize Automation 8.x using Veeam Backup and Replication.

- Understanding the backup methods
- Performing an online backup
- Performing an offline backup

Understanding the Backup Methods

Reading the VMware documentation around this subject can be somewhat confusing at times. And if you pay attention, there are subtle changes between the documents as well. Lets break this down.

  • vRealize Automation 8.0
    • As part of the backup job, you need to run a script to stop the services
    • This is known as an offline backup
    • Depending on your backup software, you can either do this by running a script located on the vRealize Automation appliance or by triggering using the pre-freeze/post-freeze scripts when a snapshot is taken of the VM.
    • The snapshot must not include the virtual machines memory.
    • If you environment is a cluster, you only need to run the script on a single node.
    • All nodes in the cluster must be backed up at the same time.
  • vRealize Automation 8.0.1 and 8.1 (and higher)
    • It is supported to run an online backup
      • No script is needed to shut down the services
    • Snapshot taken as part of the backup must quiesce the virtual machine.
    • The snapshot must not include the virtual machines memory.
    • It is recommended to run the script to stop all services and perform an offline backup.
      • You may also find your backup runs faster, as the virtual machine will become less busy.

Performing an Online Backup

Let’s start with the easier of the two options. Again, this will be supported for vRealize Automation 8.0.1 and higher. Continue reading How to backup vRealize Automation 8.x using Veeam

vRA 8.0 header

vRSLCM – vRA fails to update from 8.0 to 8.0.1 – LCMVRAVACONFIG90030

When updating my vRealize Automation instance from 8.0 to 8.0.1, I ran into an issue;

LCMVRAVACONFIG90030

Error Code: LCMVRAVACONFIG90030

vRA VA Upgrade Status Check failed.

Upgrade prepare on vRA VA sc-dc1-vra001.simon.local failed with state error. To know more about the failure, run command "vracli upgrade status --details" on the vRA VA sc-dc1-vra001.simon.local. If the prepare upgrade issue is fixed outside vRSLCM, the vRSLCM request can be proceeded to next step by clicking RETRY with proceedNext property set to true. Optionally, the whole upgrade can be cancelled and started afresh by clicking RETRY with cancelAndStartAfresh property set to true. If both the retry properties are set to true,cancelAndStartAfresh property will take precedence and will be honoured

vRSLCM vRA8 failed upgrade veducate.co .uk

I logged into my vRA node, and ran the recommended command “vracli upgrade status –details”. This basically told me no running application servers were running. Which was odd, as my vRA installation was working.

vRSLCM vRA upgrade failed vracli upgrade status details veducate.co .uk

So I ran “vracli status” and immediately seen that I had some issue with my database in the vRA node. I’m unsure if this was a pre-upgrade issue, or happening during the upgrade.

[ERROR] Exception while getting DB nodes.
...
Error getting database node status

I decided to run “deploy.sh” which re-runs all the Kubernetes configuration, thus killing and restarting all the services. This seemed to resolve my issue, as running the upgrade again worked as expected.

vRSLCM vRA upgrade failed vracli status deploy.sh veducate.co .uk

If you encounter this situation, I would recommend you contact VMware Support for guidance, and information as to why your services have stopped. As this is in my lab environment, I do not have the same considerations as those that run production.

vRA 8.0 header

vRSLCM – Replacing vRA key fails with “Failed to apply License key – LCMVRAVACONFIG590007”

The vRA evaluation license in my homelab had failed, and trying to log in, I was hitting a 402 error.

vRA license expired 402 error

When replacing the license using vRealize LifeCycle Manager, I received the below errors. This happens because the license key has already expired.

Error Code: LCMVRAVACONFIG590007
Failed to apply License key. Please check whether the license provided is correct and retry.
Failed to get vRA License Key.

LCMVRAVACONFIG590007 Failed to apply License key

The Fix

The fix for this is to re-apply the license using the vRA CLI directly on your vRA node. As per the below commands, and then re-inventory your vRA deployment in vRSLCM and finally Retrust with Identity Manager.

###### To check the current license ######

vracli license

###### To remove the license ######

vracli license remove {license key}

###### To add a new license ###### 

vracli license add {license key}

Below are the options to finalise the configuration in vRSLCM.

vRA license expired 402 error Retrust with Identity Manager

The Logs

For those of you who are interested in the log output, and for search engines to track;

Error log from vRSLCM UI as in above screenshot

com.vmware.vrealize.lcm.common.exception.EngineException: Failed to get vRA License Key. at com.vmware.vrealize.lcm.plugin.core.vra80.task.VraVaReplaceLicenseTask.execute(VraVaReplaceLicenseTask.java:134) at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

From the log bundle of vRSLCM

INFO  [pool-2-thread-5] c.v.v.l.d.v.h.VraPreludeInstallHelper -  -- Command to be run : vracli -j license
INFO  [pool-2-thread-5] c.v.v.l.d.v.h.VraPreludeInstallHelper -  -- PRELUDE ENDPOINT HOST :: sc-dc1-vra001.simon.local
INFO  [pool-2-thread-5] c.v.v.l.d.v.h.VraPreludeInstallHelper -  -- COMMAND :: vracli -j license
INFO  [pool-2-thread-5] c.v.v.l.u.SshUtils -  -- Executing command --> vracli -j license
INFO  [pool-2-thread-5] c.v.v.l.u.SshUtils -  -- exit-status: 0
INFO  [pool-2-thread-5] c.v.v.l.u.SshUtils -  -- Command executed sucessfully
INFO  [pool-2-thread-5] c.v.v.l.d.v.h.VraPreludeInstallHelper -  -- Command Status code :: 0 , Output :: {"status_code": 0, "output_data": [{"key": "XXXX-XXXX-XXXX-XXXX", "productName": null, "valid": false, "expirationDate": null, "error": "License expired"}], "error": "", "logs": {"asctime": "2020-01-28T12:55:43Z+0000", "name": "vracli", "processName": "MainProcess", "filename": "license.py", "funcName": "__get_license_result", "levelname": "INFO", "lineno": 325, "module": "license", "threadName": "MainThread", "message": "Running license command: check-serial --serial-number \"XXXX-XXXX-XXXX-XXXX\"", "timestamp": "2020-01-28T12:55:43Z+0000"}}

INFO  [pool-2-thread-5] c.v.v.l.p.c.v.t.VraVaReplaceLicenseTask -  -- Result of fetching License : null
ERROR [pool-2-thread-5] c.v.v.l.p.c.v.t.VraVaReplaceLicenseTask -  -- Failed to get vRA License Key.
INFO  [pool-2-thread-5] c.v.v.l.p.a.s.Task -  -- Injecting task failure event. Error Code : 'LCMVRAVACONFIG590007', Retry : 'true', Causing Properties : '{ CAUSE ::  }' 
com.vmware.vrealize.lcm.common.exception.EngineException: Failed to get vRA License Key.
	at com.vmware.vrealize.lcm.plugin.core.vra80.task.VraVaReplaceLicenseTask.execute(VraVaReplaceLicenseTask.java:134) [vmlcm-vrapreludeplugin-core-2.1.0-SNAPSHOT.jar!/:?]
	at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:45) [vmlcm-engineservice-core-2.1.0-SNAPSHOT.jar!/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]

Regards

Dean

vRA 8.0 header

vRealize Automation 8.0 – Wildcard SSL certificate support and deployment issues – LCMVRAVACONFIG590003

Ok, so I’m just going to call it out straight away, when using wildcard SSL certificates with vRealize Automation 8.0, read the release notes.

I did not, and caused myself quite a few headaches with the deployment, which you can read about further in this post.

Cannot set wildcard certs for certain domain names, specifically those not using a Public Suffix.

vRealize Automation 8.0 supports setting a wildcard certificate only for DNS names that match the content of the Public Suffix List ([https://publicsuffix.org/]) 

For example, a valid wildcard certificate: you can use a wildcard certificate with DNS name like "*.myorg.com". This is supported because "com" is part of the Public Suffix List. 

An invalid wildcard certificate example: you cannot use a wildcard certificate with DNS name like "*.myorg.local".This is not supported because "local" is not part of Public Suffix List. 

Workaround: Only use domain names in the Public Suffix List.

The issues caused by using an unsupported wildcard SSL

When deploying vRA 8.0 via vRSLCM, either as part of the easy installer or as part of an existing vRSLCM setup, you will asked to provide an SSL certificate.

This does not validate your certificate is supported for use with the vRA 8.0 deployment. vRSLCM will do some checking on the SSL selected, but is only to ensure the SSL certificate is not about to expire, you will see a Green tick and “healthy” status as below.

vRA deployment SSL issue LCMVRAVACONFIG590003 wild card cert

Once you hit deploy, you will find your vRA appliance finally stood up, however the initialization tasks will stall.

Error Code: LCMVRAVACONFIG590003
Cluster Initialization failed on VRA.

vRA Initialize Cluster failed on vRA VA - ***Hostname***. Please login to the vRA and check /var/log/deploy.log file for more information on failure.

Continue reading vRealize Automation 8.0 – Wildcard SSL certificate support and deployment issues – LCMVRAVACONFIG590003