Wednesday, October 14, 2015

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem : HP ProLiant Gen9 Server

Last day when I logged in to vCenter I noticed one of the host with warning icon and upon checking in summary tab found this warning,

This warning message is indicating the Embedded Flash/SD-Card (esxi embedded install) is no longer available to the ESXi host.  As this a HP ProLiant server so logged into iLO, checked into diagnostics and found the SD-Card status ok then taken a look into ilo logs and found SD-card was restarted recently.

The good news is that the whole ESXi OS loads into memory so there was no outage for the VMs and once the connectivity would restore the host can access the storage again. The bad news is that the error did not clear automatically and as no one likes to see errors/warning in their production environment so I needed to find a solution to this issue.

The simplest solution of this issue is to put the host in maintenance mode and restart the management agents. One can do this by two ways, either connect to the host using ssh and run below commands,

/etc/init.d/hostd restart
/etc/init.d/vpxa restart
Or alternatively connect to the host using iLO, establish a remote connection, login to DCUI and restart the management agents.
                                       
Once the managements agents restart will complete, vCenter will show the host back in a normal state.

Note: There might be cases where SD-Card having issues due to buggy firmware and in order to fix the issue you may need to upgrade or downgrade the firmware.
we are at firmware version 2.20 and as per various forums this version have SD-Card related bug and that was supposedly fixed in firmware version 2.22, as version 2.30 is also available so one may upgrade to one of these versions of firmware.

Other Scenario: What if SD-Card is failed, you can try to remove and reattach the SD-Card but if it still doesn’t come online then you need to call the server vender for its replacement. 
But if SD-Card is bad, migrate all VMs to other hosts then put the host in maintenance mode and take backup of host configuration. Now shut down the host and after replacing the flash drive reinstall the esxi (As the host will not come up after reboot), once the host comes up, configure the management network and VLANs then restore the host configuration.

ReferenceDaniel's blog and discussion on other forums.

Update, 05/11/2015:- This week we faced the same issue again so instead of fixing it myself contacted HP support and they confirmed the issue is with firmware version 2.20 that we have on these G9 server.


Response from hp support:  That version 2.20 has been removed from our site due to it causing issues with server components, including the embedded flash cards. . The new iLO firmware 2.22 addresses/fixes issues with the embedded cards disconnecting.
  

That’s it… :)


8 comments:

  1. Hey Noor, I am checking to see if you still have this problem. We have a few G9 servers that has been upgraded from 2.40 to 2.44 but this issue still persists.

    ReplyDelete
    Replies
    1. After upgrading the firmware to ver 2.30, we didn't see this issue again.

      Delete
  2. We are running 4x BL460c G9 and one of then has this issue. We are running firmware 2.30

    ReplyDelete
  3. We have this issue only on some of our BL465c Gen8 servers - with iLO version 2.40 or 2.44

    ReplyDelete
    Replies
    1. Did you try to remove and then reconnect the SD card....btw i would suggest you to check with HP Support about the same.

      Delete
  4. I have a BL460c G9 with spinning disks on v2.40 that had this issue. Upgraded ilo to v2.50. I am reading elsewhere that the issue is still not resolved with v2.50.

    ReplyDelete
    Replies
    1. not sure in case of magnetic disk......but in case on flash drive/SD...i didn't see this issue after upgrading the iLO...in your case there might be some other issue...if you have active support the...plz check with HP

      Delete
  5. HP Gen 8/9 lose access to device backing boot filesystem
    https://kb.vmware.com/kb/2144283

    ReplyDelete