For the past few weeks I’ve been working with a customer to resolve an issue with a re-purposed host where it PSOD every week.
Issue: We installed a new HP Gen8 server as a production host in their main site, installed with VMware ESX 5.5 u1, and then took the HP G7 ESXi 5 box to their second site and repurposed this as a DR host, with Veeam sending replica’s to this host, and it would also run a domain controller and one production VM related to the site it was located.
What didn’t work: Everything run fine for around two weeks, then one weekend during Veeam performing a replica of a VM to this HP G7 host, it PSOD. The customer rebooted the host, and we did some investigation and decided it was caused by CoalesceWrites (VMware KB)
Unfortunately a few days later, the host PSOD again. We reported this to VMware and to HP, of which VMware came back to us first, stating that they believe it was a BIOS update and to install the latest one (Link from VMware Tech Support)
This resolved the issue for around two weeks, before it happened again.
What seemed to have worked: Once again I reported it to VMware and to HP, with VMware Tech Support getting back to me first, interestingly enough they did some more inspection to find it was the hpsa driver causing the issue. HP did not find anything, so I passed over the VMware info and asked for a link to the most up to date driver for the device which they provided, unfortunately HP gave me the wrong driver the first time around. So the host PSOD again in the following week, so after third and final time of reporting to to VMware (they diagnosed the driver as at fault again) and HP, I was given the correct driver and was up and running.
Here is part of the response from VMware Tech Support from the third and final time of contact;
We are getting multiple memory related error messages with respect to the hpsa driver
SC[7m2014-07-13T02:17:47.659Z cpu3:2619674)WARNING: Heap: 4089: Heap_Align(vmklnx_hpsa, 16384/16384 bytes, 8 align) failed. caller: 0x41801168631 cESC[0m
2014-07-13T02:17:47.659Z cpu3:2619674)<3>hpsa 0000:0b:00.0: out of memory 2014-07-13T02:17:47.664Z cpu13:2619673)<4>hpsa 0000:05:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562
2014-07-13T02:18:17.662Z cpu4:2619738)<3>hpsa 0000:0b:00.0: out of memory 2014-07-13T02:18:17.667Z cpu22:2619737)<4>hpsa 0000:05:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562
2014-07-13T02:18:42.051Z cpu9:2603471)<4>hpsa 0000:0b:00.0: cp 0x410a13e97000 has status 0x2 Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x 2 2014-07-13T02:18:42.051Z cpu11:32779)<4>hpsa 0000:0b:00.0: cp 0x410a13e97280 has status 0x2 Sense: 0x5, ASC: 0x24, ASCQ: 0x0, Returning result: 0x2
2014-07-13T02:18:42.052Z cpu5:32773)<4>hpsa 0000:05:00.0: cp 0x410a13e75500 has status 0x2 Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x2 2014-07-13T02:18:42.052Z cpu5:32773)<4>hpsa 0000:05:00.0: cp 0x410a13e75000 has status 0x2 Sense: 0x5, ASC: 0x24, ASCQ: 0x0, Returning result: 0x2
2014-07-13T02:18:42.052Z cpu5:32773)<4>hpsa 0000:05:00.0: cp 0x410a13e75000 has status 0x2 Sense: 0x5, ASC: 0x24, ASCQ: 0x0, Returning result: 0x2 2014-07-13T02:18:47.665Z cpu16:2619826)<3>hpsa 0000:0b:00.0: out of memory 2014-07-13T02:18:47.670Z cpu23:2619825)<4>hpsa 0000:05:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562
2014-07-13T02:19:17.668Z cpu19:2619892)<3>hpsa 0000:0b:00.0: out of memory
This issue is resolved in the HPSA driver version 18.104.22.168-1, currently the host is running on lower driver version
The version on the host which is affected now is scsi-hpsa 22.214.171.124-1OEM.5126.96.36.1991820 Hewlett-Packard VMwareCertified 2014-06-03
And the controllers are :
0000:05:00.0 Mass storage controller: Hewlett-Packard Company Smart Array P410i [vmhba0]
0000:0b:00.0 Mass storage controller: Hewlett-Packard Company Smart Array P410 [vmhba1]
I would recommend you to update the hpsa driver to the recommended version to avoid PSOD caused by hpsa driver. Please find below the HP link which talks about this issue http://h20564.www2.hp.com/portal/site/hpsc/public/kb/docDisplay/?docId=c04302261
The final note to be made is that HP’s website for installing the correct vib file is wrong. The correct syntax should be;
esxcli software vib install -v “/vmfs/volume/filename.vib”
Then run the below to ensure that the correct driver version is showing as installed
esxcli software vib list
It must also be noted that Nick Furnell posted a tweet to this blog post which details the same fix, at around the same time HP gave me the wrong driver to fix the issue, maybe I’d of wrote the post quicker had luck gone my way
Over and Out