Sunday, August 2, 2015

Powered off VMs are greyed out and showing as not accessible

Few days ago I faced this issue where some powered off VMs were showing as not accessible in inventory.

When I checked vCenter alarm mails for that time around, found below listed lines in mail means this happened because of lost storage connectivity.
Alarm Definition:
([Event alarm expression: Lost Storage Connectivity] OR [Event alarm expression: Lost Storage Path Redundancy] OR [Event alarm expression: Degraded Storage Path Redundancy])
Path redundancy to storage device naa.6006019004cf12d00e0445e759755e598 degraded. Path vmhba2:C0:T0:L127 is down. Affected datastores: Unknown

Actually only VMs that were powered off at the time show up greyed out; all VMs that were running during the storage failure have been hanging, but they resumed properly after the LUN came back online. This issue is specific to 5.0, 5.1 hosts and related to NFS datastore but like in my case can occur with fiber channel storage too.

Fix: Same as orphan VM issue we can simply fix this inaccessible VM issue by removing affected VMs from inventory and then by re-adding the same (Go to datastore and browse the folder of affected VM, right click on vm configuration file (vm_name.vmx) and register it.

Caution: Before performing these steps, make a note of the datastore that the virtual machine resides on

One might have an issue with this procedure because vC will treat it as a new VM so the backup software, most probably you will have to configure backup again.

If you don’t want to remove/reregister the VM from/to inventory, you can do the same using vim-cmd command.

List the name of affected VM and connect to residing host over ssh using putty or DCUI,

Vim-cmd vmsvc/getallvms

Or to list all inaccessible VMs on this host,

vim-cmd vmsvc/getallvms >grep skip

Notedown the wids off affected VMs,

Vimcmd vmsvc/reload wid

This will reload the vm to inventory and fix the issue without removing and re-adding your VMs. Simply repeat the reload command for all the “skipped” VMs and your problem is solved.

BUT you will have to do this to one by one……what if you have multiple VMs with this issue…..there is another way where you can list/reload all inaccessible VMs at once using a simple one liner powershell script

#Get Inaccessible Virtual Machines
$VMs = Get-View -ViewType VirtualMachine | ?{$_.Runtime.ConnectionState -eq "invalid" -or $_.Runtime.ConnectionState -eq "inaccessible"} | select name,@{Name="GuestConnectionState";E={$_.Runtime.ConnectionState}}
write-host "---------------------------"
Write-host "Inaccessible VMs"
write-host "---------------------------"
$VMs

#Reload VMs into inventory

(Get-View -ViewType VirtualMachine) |?{$_.Runtime.ConnectionState -eq "invalid" -or $_.Runtime.ConnectionState -eq "inaccessible"} |%{$_.reload()}

Note: Restarting mgmt agents or disconnecting/reconnecting the Esxi host might also fix the issue.

Reference: Matt Vogt and Erik Zandboer's blogs.  

That's it.... :)


No comments:

Post a Comment