Friday, January 5, 2018

Intel / AMD processor vulnerability: Meltdown-Spectre and VMware Esxi

Most of us would be aware about this by now...if not, there were serious security flaws named Meltdown and Spectre discovered in processors designed by Intel, AMD and ARM, these flaws could let attackers steal your sensitive data.

These flaws were discovered by security researchers at Google’s Project Zero in conjunction with academic and industry researchers from several countries. Combined they affect virtually every modern computer, including smartphones, tablets and PCs from all vendors and running almost any operating system like Windows, macOS and Linux etc.


The two ‘bugs’ stem from design flaws of microprocessors that have the potential to allow applications, malware, and JavaScript running in web browsers, to obtain information from the operating system kernel’s private memory areas.

So here you may think how would it affect the Vmware Esxi platform and the VMs running on it.

VMware has issued a Security Advisory (VMSA-2018-0002) for the same and according it, CPU data cache timing can be abused to efficiently leak information out of miss-speculated CPU execution, leading to (at worst) arbitrary virtual memory read vulnerabilities across local security boundaries in various contexts. (Speculative execution is an automatic and inherent CPU performance optimization used in all modern processors.) ESXi, Workstation and Fusion are vulnerable to Bounds Check Bypass and Branch Target Injection issues resulting from this vulnerability.

Result of exploitation may allow for information disclosure from one Virtual Machine to another Virtual Machine that is running on the same host.

To remediate the observed vulnerability (known variants of the Bounds Check Bypass and Branch Target Injection issues) in each in different versions of Esxi releases, one need to install a corresponding patch from the list.

VMware Patches for different  ESXi Versions:

  • ESXi 6.5 – ESXi650-201712101-SG,    There are new patches available,
  • ESXi 6.0 – ESXi600-201711101-SG,    Refer to following Advisory VMSA-2018-0004
  • ESXi 5.5 – ESXi550-201709101-SG *
* This patch has remediation against CVE-2017-5715 but not against CVE-2017-5753.

Downloads:  https://my.vmware.com/group/vmware/patch, Search with the patch name.

Whilst this will secure the risk of data leakage between virtual machines it will not mitigate against the risk of data leakage within individual virtual machines. To protect against this threat operating system specific security updates must be installed.
Microsoft has already released a patch on Jan 3rd, 2018 to fix this issue on systems running on Windows OSs.

Also Apply the applicable firmware update provided by your server/device manufacturer, Useful Link.

Note: It has been speculated that patching the flaws would cause performance hit. At this time, it’s still unclear what would be the degree of performance hit, currently the details available varying with the source of information.

Related Read: https://www.pcworld.com/article/3245606/security/intel-x86-cpu-kernel-bug-faq-how-it-affects-pc-mac.html
https://www.theverge.com/2018/1/4/16848976/how-to-protect-windows-pc-meltdown-security-flaw

That’s …. 😊


Thursday, December 21, 2017

How to capture memory dump of a VM from snapshot or suspended state file

This is something could be requested by an application vendor for debugging purpose to investigate an application related issue. If you get any such request for VM memory dump then you might wonder how to capture a memory dump from a VMware virtual machine without stopping its execution.

If this is a production VM then you might not want to force a crash or change Windows dump parameters and reboot the machine. So, how can we capture a memory dump of a VM without interrupting it.

There is a VMware fling called vmss2core, using which we can convert the checkpoint state files into formats that third party debugger tools understand. It can handle both suspend (.vmss) and snapshot (.vmsn) checkpoint state files as well as both monolithic and non-monolithic (separate .vmem file) encapsulation of checkpoint state data.

The vmss2core tool can produce core dump files for the Windows debugger (WinDbg) as well as for other operating systems. Please refer to given screenshot for more info.

For more info about usages of vmss2core tool, please refer to Debugging Virtual Machines with the Checkpoint to Core Tool

We need to take a snapshot of affected VM when it hangs, crashes, or otherwise display symptoms you are troubleshooting and then download the snapshot state file (.vmsn) and VM paging file (.vmem) which can later be converted to Windows memory dump file (.dmp) using vmss2core utility.

Steps:
1. Copy the vmss2core.exe utility to the same location where you downloaded the VM snapshot or suspended state files (.vmsn – in case of snapshot or .vmss for suspended state).

2. Open a command line and navigate to the location of the snapshot / suspended state files and execute the following command:

Here you would use the tool with OS‐specific options. For example, this command generates a memory.dmp file for the Windows debugger, WinDbg.

For Snapshot:
C:\folder>vmss2core.exe -W snapshot.vmsn [snapshot.vmem]

For Suspended state:
C:\folder>vmss2core.exe -W snapshot.vmss 

If the snapshot file is from a Windows 8 or Windows Server 2012 VM, use

C:\folder>vmss2core.exe -W8 snapshot.vmsn [snapshot.vmem]

Successful output of this should be a "memory.dmp" file suitable for use with WinDbg

Please note: VM paging file (.vmem file) may not be present depend on the state of VM.

Related VMware kb# 2003941 

That's it... :)


Wednesday, December 20, 2017

Fixing the error "vMotion is not enabled on the host of the Virtual Machine"

This is first time when I saw such error on a production host, which I know have everything properly configured (I mean, identical vmkernel port with vMotion enabled) and moreover vMotion was working for for this VMhost before.


When I saw this error at first, thought someone probably disabled vMotion inadvertently however on cross-checking found everything correct.


So, this is something happening due to probably some kind of technical glitch and to fix this we either need to reboot the host or simply disable/re-enable the vMotion in vmkernel port properties.

You can disable/re-enable the vMotion by editing respective vmkernel port but make sure you give it a few minutes before re-enabling.

That’s it…. 😊


Thursday, November 30, 2017

"The disk is write protected" error on a Windows Failover Cluster Node VM

Lately, I came across this issue where database team was unable to start the sql service on SQL cluster nodes and when they checked SQL logs, found the drive where temp database stored was no more writable (I have seen this issue a few times in past and if I remember correctly every time this happened on Windows Server 2008 R2 server failover cluster nodes) .

When tried to create a new object in this drive, found no option to do so and checking the disk for errors from disk properties => Tools, end up with an error like,


I am not sure what caused this issue however, found following Volume Shadow Copy service error entries in event viewer, “A critical dynamic disk is a virtual hard disk. This is an unsupported configuration.” So, suspect this has something to do with volume shadow copy and Microsoft server cluster.


We can fix this Write protected disk issue by clearing the read only attribute from volume level (in my case clearing the read only attribute from disk didn’t work so suggest you clear the attribute at lowest level).

To do so follow these steps:

First open disk management and note down the disk and volume name/number of affected drive/volume.

  1. Open a command prompt, type in Diskpart and then press Enter.
  2. Run the command “List Volume” and press Enter.
  3. Now decide the Volume name, dive letter (Affected Read only volume, identified earlier from Disk Management).
  4. Select the affected volume by using “Select volume x” command, where x is the volume number
  5. One the affected volume is selected, now to clear the “Read-only” attribute, run the command “attributes volume clear readonly”. 
                       
And with this you are done, now your disk/volume should be writable.

You can check the same by running “detail volume” or “Attributes volume” command.

Note: Here one may ask here why I didn’t verify the disk/volume read/write attribute earlier and the short answer is yes, I did check and interestingly readonly attribute was set to No but still I have to clear the readonly attribute to make it writable again 😉.

In Case if your folders inside the drive are still not writable then run following cmd to remove the Read-only attribute and to set the System attribute.

C:\attrib -r +s drive:\folder_name 

Hope this would be helpful to others.........That's it :)


Sunday, October 1, 2017

HP SPP upgrade failing with unexpected errors

This week while upgrading server Firmware/Drivers on HP ProLiant G9 servers using latest HP SPP, on one server SPP online deployment return with following,


And when we checked install logs found all updates failed. As lately I had faced an issue where HPSUM inventory was failing due to missing HP management tools and as per my past experience I was aware about the fact, VMware update installations or individual vib install may fail if server has no esxi image profile attached/image profile got corrupted hence thought of cross checking the same.

On checking found this server had no host image profile defined,


When checked the esxupdate.log file found entries of the name of HPE custom esxi 6.0 u2 install file that mean the install was done using the correct ISO image and somehow it got corrupted later.

Then checked for the available image profiles by searching for Imgdb.tgz file (this is the image profile backup and by looking at its size we can get an idea if the image profile got corrupted), this will list the two Imgdb.tgz files as follows,


By looking at the size of these files, size of one imgdb.tgz file is comparatively very less and seems got corrupted.

We can fix this broken image profile issue by either doing an upgrade or fresh install. There is one more way to fix the above issue and that is by copying/replacing the host image profile of this host with an image profile from a working host.

We can do that as follows,

Use winscp or any other ftp client to connect to any healthy Esxi host and browse to /bootbank dir and copy the Imgdb.tgz file to your system and now connect to the host that is missing the image profile and paste the earlier copied Imgdb.tgz file to /tmp dir.

Now

Remove the corrupted imgdb.tar file,

rm /bootbank/Imgdb.tgz

Extract the Imgdb.tgz dir copied from healthy host,

tar -xzf /tmp/Imgdb.tgz

then copy the working image profile and related vibs to /var/db/esximg/profiles and /var/db/esximg/vibs directories,

cp /tmp/var/db/esximg/profiles/* /var/db/esximg/profiles/
cp /tmp/var/db/esximg/vibs/* /var/db/esximg/vibs/

After this run following cmd to create backup of image,

/sbin/auto-backup.sh

Now when you check for the installed profile, it will show you one. It wouldn’t reflect the host profile name in host summary until you re-start the host mgmt agents or reboot the host.

After doing above, when I re-initiated the SPP online update, this time host firmware/driver got updated without any further issue.


That’s it… J