How to setup ESXi with Fusion IO

Today I installed an ESXi host with an Fusion I/O Accelerator card and I would like to share my experience with you.

Background
This server will be used by VMware Horizon to run VM’s on it.

Hardware
HP ProLiant DL380p Gen8
HP 1.6TB HH/HL Value Endurance (VE) PCIe Workload Accelerator

Documentation, firmware and drivers
http://h20565.www2.hpe.com/hpsc/swd/public/readIndex?sp4ts.oid=7169465&swLangOid=8&swEnvOid=4166

Installation of ESXi
The first step when installing ESXi on a new HP server is to run the HP Proliant Support Pack to update the server with new firmware versions (if applicable).

After updating the server I installed and configured ESXi. Now it was time to add a datastore, but when I tried to add a datastore there was nothing to add. I hoped to found the I/O card and create a datastore on it.

RTFM
Because I was unable to add a datastore on the I/O card I started looking for information and found documentation, firmware and drivers. The “USER GUIDE FOR VMWARE ESXI” was very helpful and explained what to do. This guide explained how to install drivers manually, but because I expect to install much more of these servers I hoped to automate the installation of the drivers.

VUM
I was hoping to add the drivers to VUM, create a baseline and apply the drivers to this (and in the future to others) host. I added the drivers below to the Patch Repository in VUM by using the Import Patches feature:
– libvsl-1.0.0-550-offline-bundle.4.1.2.428.zip
– scsi-iomemory-vsl4-55L-4.1.2.428-offline_bundle-2368400.zip

I used the above drivers because I use ESXi 5.5 U2. If you are using ESXi 5.x (not 5.5) the use these drivers:
– libvsl-1.0.0-550-offline-bundle.4.1.2.428.zip
– scsi-iomemory-vsl4-5X-4.1.2.428-offline_bundle-2366406.zip

VUM-Drivers

Now it is time to create a baseline. I called mine: HP I/O Driver. The baseline type is Host Extension and I added both drivers to this baseline.
Baseline
BaselineDrivers

This baseline can now be attached to hosts containing I/O cards and you can start remediating the host(s) to install these drivers.

Firmware
Now that the drivers are installed I was hoping (again) to add a datastore. Still no luck. So I went back to the user guide and noticed the following: With the command fio-status you can check the status of the I/O card. I did this and noticed: Attach status is “Status unknown: Driver is in MINIMAL MODE:”. The user guide explained that this could have something to do with the firmware version. So next step: Update firmware.

On datastore1 (ESXi is installed on a local SSD disk) I created a folder called Bundles and put the firmware file in that folder (ioaccelerator_4.1.2-20141212.fff).

Now I can update the firmware by running the command below from the console:
fio-update-iodrive -d /dev/fct0 /vmfs/volumes/datastore1/Bundles/ioaccelerator_4.1.2-20141212.fff

After 5 or 10 minutes the firmware was installed and I rebooted the server.

After the reboot I was finally able to configure a datastore on the I/O card.

Bandwidth and PCI slot
When running the fio-status command again I now see the I/O card to be online and Attached. I do get a warning message however about the PCI slot. I think this can be solved by moving the I/O card to a different (faster) PCI slot. Maybe I’ll update this post once this is done.
Bandwidth

Other settings
The user guide also talks about: Power settings, how to format the I/O card correctly, how to address possible memory issues, Disable CPU Frequency Scaling and Limiting ACPI C-States. So I recommend you to read this guide carefully and consult you manufacturer for these kind of settings. The user guide is from SanDisk (I think HP rebranded it) and not from HP so before adjusting any setting contact you manufacturer. Maybe I’ll update this post once the settings become clear.

~~~UPDATE~~~

HP investigated this issue and gave the following recommendations:

  • Install the IO accelerator in the PCI Slot 5.
  • Adjust power settings in BIOS
    • HP Power Profile – Max Performance
    • HP Power Regulator – HP Static High Performance Mode
    • Advanced Power Management Options – Minimum Processor idle state – No “C” states

After changing the BIOS and moving the card to a different PCI slot the warning disappeared.

Disable Internet Explorer Enhanced Security in Citrix Windows8and2012VDIBaseline.vbs script is missing the curly bracket }

Missing curly bracket } in the following section:

‘ Disable Internet Explorer Enhanced Security Enhanced
oShell.RegWrite “HKLM\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A7-37EF-4b3f-8CFC-4F3A74704073\IsInstalled”, 0, “REG_DWORD”
oShell.RegWrite “HKLM\SOFTWARE\Microsoft\Active Setup\Installed Components\{A509B1A8-37EF-4b3f-8CFC-4F3A74704073\IsInstalled”, 0, “REG_DWORD”

I found this typo because i was working on an issue with Vimeo video playback that gives a black screen. IsInstalled needs to be 0 or else Vimeo won’t play videos.

After setting the registry key run the following commands in CMD.

Rundll32 iesetup.dll, IEHardenLMSettings

Rundll32 iesetup.dll, IEHardenUser

Rundll32 iesetup.dll, IEHardenAdmin

IE ESC is now disabled applied for admins and users.

How to set default keyboard settings for a user in a Windows remote desktop session.

This problem had me going for a few hours.. Setting the right keyboard, language settings and Internet explorer 11 spell check for a user when he logs on.

Normal behavior is that the RDS ( Citrix ) session inherits the keyboard settings from the client, but what if we don’t (or can’t) control the endpoint device? Or users falling asleep on their keyboard and touching ALT + SHIFT : -).

This is what we wanted:

  1. Ignore the client keyboard settings at all time.
  2. Set the right keyboard layout and default language, characters etc etc.
  3. Set the correct language auto correct in Internet Explorer 11

Note:

This can’t be fixed after the user logon due to the fact that Windows only checks these settings during logon..

This is how we dit it:

  1. Set the key (REG_DWORD) IgnoreRemoteKeyboardLayout to 1 in the HKLM\System\CurrentControlSet\Control\Keyboard Layout\
  1. Create a mandatory profile (there are many how to’s on the web) and adjust the following keys after mounting the MANPROF:

HU\MANPROF\Control Panel\International

I replaced the whole key and subkeys by exporting it from a freshly adjusted user, watch carefully when importing it back again to the man Prof because of the location of the import!

Export : [HKEY_CURRENT_USER\Control Panel\International]

Key1

Key2

[HKEY_USERS\MANDATORY\Control Panel\International]

Key1

Key2

                [HKEY_CURRENT_USER\Control Panel\International\User Profile\nl-NL]

                [HKEY_USERS\MANDATORY\Control Panel\International\User Profile\en-US]

               

Also adjust the following keys and set a list of the keyboards you want to add, in my case it was dutch with a us int keyboard.

(For a list of the available keyboards check your RDS server: HKEY Local Machine\System\CurrentControlSet\Control\Keyboard Layouts\)

HU\MANPROF\Keyboard Layout\Preload\

Type: REG_SZ

Name: 1

Value: 00000413

HU\MANPROF\Keyboard Layout\Substitutes\

Type: REG_SZ

Name: 00000413

Value: 00020409

  1. This should be fixed by step 1 and 2. When my users log on they get the Dutch language with an US International keyboard and the spell check is set to Dutch in IE11.

And.. Maybe you want to delete the keys in HU\MANPROF\Keyboard Layout\Toggle\ for the users who fall alsleep : )

last note: GPP did not do the trick…

SRM 5.8 – Site Recovery Manager plugin does not display in the vSphere Web Client

Because I promised my colleagues to demonstrate SRM later today I wanted to make sure everything is working properly (as it should be and was for many weeks).

So I logged in to the Web Clients and clicked on the Site Recovery Manager Extension.
SRM extension

I wanted to perform a recovery and failback of some test VM’s, but noticed a message:
“getAttribute: Session already invalidated”

I also noticed that there was no connection to the SRM server from my recovery site. I searched google first (offcourse) and found exactly one blog mentioning this issue is resolved by rebooting the vCenter server.

I don’t like solutions like this and because I have a vCenter environment where vCenter Server, Web Client, SSO, inventory service and SRM are installed on different nodes. I thought it would be wise to just restart the VMware vSphere Web Client Service.

After I’ve restarted the service I logged back into the web client and now the Site Recovery Manager Extension is not loaded!!! WTF?

After searching on the web I found out this could be related to browser cache. So I cleared the cache, and tried it again. Result: Still no Site Recovery Extension Manager. Then I checked the release notes and noticed the following:

Site Recovery Manager plugin does not display in the vSphere Web Client if Site Recovery Manager service stops.
After you install Site Recovery Manager and the service stops for any reason, the vSphere Web Client does not display the Site Recovery Manager plugin.Workaround: Restart the vSphere Web Client.

First thing I checked was the status of the SRM service. It was running. Just to be sure I restarted the service, restarted the VMware vSphere Web Client Service again and logged back into the web client. Unfortunately still no Site Recovery Manager Extension.

No I started to sweat, In a couple of hours I needed to demonstrate SRM to my colleagues.

I restarted the vCenter Server service and VMware vSphere Web Client Service. Still no luck.

I decided to reboot the SRM server and again the VMware vSphere Web Client Service. I logged in to the web client and HAPPINESS, the Site Recovery Manager Extension is there again. And the message about the getAtribbute is gone!!!

I performed a failover and failback and now I’m ready to give the demonstrationg to my colleagues.

VMware Site Recovery Manager 5.5 – recovery fails with Unable to copy error

I just ran into a problem with SRM version 5.5.1.4. And I would like to tell you more about it.

I recently was involved in building a VMware environment with SRM as Disaster Recovery solution. Because this was a first for this organization we did a proof of concept (POC) before building the production environment.

Background
The POC environment was build on HP Gen8 Blade servers
The SAN we used was IBM Storage Volume Controller
vSphere 5.5 Update 1 is used (ESXi and vCenter)
SRM 5.5.0 or 5.5.1.2 (not sure anymore)

After building the POC environment it was time for testing SRM. After some errors related to unmounting datastores (maybe I’ll write a blog about that later) the recovery went fine. We executed the recovery plan many times (+50) without receiving any errors. So after all functional tests have been done it was time to recreate the environment so we deleted everything and started building from scratch.

Problems arises
I rebuild the entire environment using the latest available (minor) versions.
vSphere 5.5 Update 2 is used (ESXi and vCenter)
SRM 5.5.1.4

Although SRM 5.8 came out we did not used this version because SRM 5.8 uses the web client and not the .NET client anymore and our POC was based on SRM 5.5 and all documentation was based upon SRM 5.5.

After the rebuild it was time to test SRM again. The first couple of recovery’s went fine, but after that I received errors. The strange thing here was these errors are not consistent, meaning they don’t show up on every recovery. The error causes some virtual machines to loose there network mapping and prohibits them from starting.

Errors:
Error – Unable to copy the configuration file ‘[DatastoreName] SRM-03/SRM-03.vmx’ from the host to ‘C:\Users\serviceaccount\AppData\Local\Temp\vmware- serviceaccount \SRM-03.vmx72-101’ Failed to copy file ‘[DatastoreName] SRM-03/SRM-03.vmx’ to ‘C:\Users\ serviceaccount\AppData\Local\Temp\vmware- serviceaccount\SRM-03.vmx72-101’: 11 – The session does not have the required permissions.

After searching the web I found this knowledgebase article from VMware and thought YES this is a known error and the solution is nearby.

VMware Knowledgebase Article

Sadly the solution provided by VMware did not resolve the problem. I created a case by VMware and at first they focused on increasing the settings mentioned in the knowledgebase. They thought it has something to do with the SAN not being ready after the LUN’s where promoted. This resulted in an increased recovery time. In our POC environment a recovery took 15 minutes, now with all the timeouts it took 35 minutes. This was not acceptable, and luckily also not the solution.

VMware Engineering
After the case was escalated\transferred to the engineering team the discovered a timing problem between the ESXi, vCenter and Storage Array. I don’t mean a timing error like clocks are out of sync or in a different timezone, I mean a timing issue in the SRM runbook.

VMware believed these errors were solved in SRM 5.5.1.3, but then realized these solution where not implemented, only in SRM 5.8.

Solution
So after this the solution was simple: Upgrade to SRM 5.8. I did just that and after testing the recovery (50+ times) I’m glad this issued is resolved.

Conclusion
Test recovery plans multiple times (not 10 times really 30 or 40 times) to be sure they are working.
After upgrading components test SRM again.

Thanks to VMware support for helping out!

Deleting the registry value ConnectionCenter (Citrix Receiver) to prevent autostart results in a re-install of icawebwrapper.msi

In a managed environment, like SBC, you typically don’t want that the Citrix Receiver is started for every user. To prevent this behavior you delete the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\ConnectionCenter value in registry.

My results with Citrix Receiver 4.2 are a broken Citrix Receiver installation which starts a reinstall of the icawebwrapper msi. The Citrix Receiver is checking the existence of the value ConnectionCenter in te registry. To prevent this from happening you shouldn’t delete the ConnectionCenter value but make it empty!

How to avoid duplicate MAC addresses when migrating to a new vCenter server

When moving VM’s from one vCenter server to another or moving ESXi hosts (with VM’s) to another vCenter there’s an aspect which is easily overseen: MAC address usage!!!

Background
First a little bit of background information. When you create a VM the MAC address assigned to it is generated by vCenter. The MAC address of a VM created on a VMware 5.5 platform will always start with 00:50:56. But how are the other octets determined?

Each vCenter server has an installation ID. This is generated randomly during installation of vCenter (ranges from 0 to 63). Based on this installation ID the fourth octet is determined. To calculate the fourth octet we take the installation ID number, add 128 to it and the convert it from decimal to hexadecimal.

You can view the installation ID by going to the webclient, select the vCenter, click on the Manage tab and choose settings. Now click Edit and go to Runtime Settings. The Installation ID is now displayed.
vCenterID

Example
So suppose the installation ID is 34. We take 34 (installation ID) and add 128 to it. This gives us a decimal value of 162. If we convert this to hexadecimal we get A2.

The MAC address of a VM created on a vCenter with installation ID 34 now starts with 00:50:56:A2. The last two octets are generated by vCenter automatically.

Problem
Normally you never have to worry about MAC addresses because vCenter handles this for you. But let’s assume we are building a new vCenter environment and want to migrate the VM’s (or ESXi with the VM’s) to this new vCenter environment or you are consolidating multiple environments to one vCenter.

If you don’t check the MAC addresses of the VM’s you are moving to the new vCenter a rare problem can occure when you create a new VM. This new virtual machine could get the same MAC address of one of the VM’s you migrated to this vCenter.

Solution
Before you migrate VM’s to your vCenter check if the MAC address of the VM is in the same range as the vCenter you are migrating to. If we look back at the above example I should not migrate VM’s which have MAC addresses starting with 00:50:56:A2.

You could migrate this VM and then delete the network adapter and create a new one (with a new MAC offcource). If this is problematic another solution could be to change the Installation ID and restart the vCenter service. This way you force vCenter to issue MAC addresses with a other fouth octet.

There are more solutions possible, bottom line of this blog is: check the MAC addresses and MAC address assignment to avoid duplicate MAC addresses!!!