Friday, December 16, 2011

ESXi 4.1 host’s inventory no longer shows up and presents the error: “Configuration Issues” “The virtual machine inventory file on host hostName.domain.com is damaged or unreadable.”

Problem

You have just rebooted your ESXi 4.1 host, logged in and notice that the inventory no longer shows up and presents the error:

Configuration Issues

The virtual machine inventory file on host hostName.domain.com is damaged or unreadable.

image

Solution

I’ve searched around on the internet and was only able to find one post that referenced this error and since I had to get the environment up in a hurry, I ended up opening a case with VMware. The engineer said that this error was rare and the only reason why he knew about it was because his colleague, which happens to sit beside him, ran into this error a week ago. 

This usually happens if the vmInventory.xml, which is located in the directory /etc/vmware/hostd is either corrupted or somehow has another process that has a lock on it which in turn prevents it from being read properly.

We proceeded to perform the obvious checks by trying to cat the file to see if we can read the content:

image

As shown in the screenshot above, we don’t get any content returned so we proceeded to double check to ensure it’s not a disk space issue by executing df -h:

image

As shown in the screenshot above, we did not run out of space so we issued a ls -l to see see what the file size was:

image

As shown in the screenshot above, the file size was listed as 0 bytes so this was when he asked his colleague what he did to correct this problem and what he was told was that he used an editor such as VI to add the line:

<ConfigRoot>

into the vmInventory.xml file, save it, restart the services and the inventory list came back.  I proceeded to try and do this:

image

image

Just before I restarted the services, I tried to cat the file again but only saw the line I put in:

image

I proceeded to restart the services by executing the services.sh restart:

image

… but the inventory didn’t come back.  At this point, I was running out of time so I told the engineer that I was going to proceed with manually adding all the VMs back via the GUI since this was a standalone server with only 10 to 15 VMs:

image

After I added the first VM, I tried to cat the vmInventory.xml file once more and this time saw more content in it:

image

This was when I believed that the file was simply corrupted so I prceeded to add the rest of the virtual machines manually.

I spoke to the engineer a bit later and he said he’s not sure what the root cause was but if it was a lock on the file by some other process, we could use the lsof (list open files) command to find out exactly what processes are may have a lock on the file.

I hope this helps anyone out there who may encounter the same problem.

Thursday, December 15, 2011

Unable to find a user to enable for Microsoft Lync Server 2010 even though they’re not enabled

Problem

You’re trying to enable a user for Microsoft Lync Server 2010 but cannot find their account listed as you try search for them in Microsoft LyncServer 2010’s Control Panel:

image

You double check to ensure that their account is listed in Active Directory Users and Computers:

image

Solution

What I didn’t initially notice was that the account as shown in the screenshot above and below:

image

… was that the problematic account actually had an Office Communications Server SIP address. I didn’t notice this till I started writing this blog post so I proceeded to open ADSIedit to have a look at the msRTCSIP- attributes of the account because I suspected that LCS or OCS may have been deployed in this environment at one point:

image

I made a slight mistake with the screenshot above so take my word for it when I say the account on the right, which was an account that was not enabled for Lync, did not have any msRTCSIP- attributes. 

The first attempt I made was to simply set the msRTCSIP-UserEnabled attribute to False:

image

… but this didn’t fix the issue and setting the attribute to Not Set didn’t either. 

The next attempt I made was to try and delete value in the msRTCSIP-PrimaryUserAddress but as I tried to hit apply, I got the error:

Operation failed. Error code: 0x57
The parameter is incorrect.

00000057: LdapErr: DSID-0C090A85, comment: error in attribute conversation operation, data 0, vece

image

image

The reason for the error above was because I simply deleted the value in the attribute’s field and the proper way to remove that attribute is to actually hit the Clear button which will automatically put the value <not set> value into the field as such:

image

The probably still persisted even after setting the msRTCSIP-PrimaryUserAddress to a <not set> value so I proceeded to do the same for the attributes:

  1. msRTCSIP-ArchivingEnabled
  2. msRTCSIP-OptionFlags
  3. msRTCSIP-PrimaryHomeServer

Once I cleared all of these msRTCPSIP to <not set> the account now showed up in the list of accounts that could be enabled for Lync Server 2010:

image

Wednesday, December 14, 2011

VMware vCenter Converter fails with the error: “FAILED: Unable to find the system volume, reconfiguration is not possible.”

Problem

You use VMware vCenter Converter to convert a virtual machine and notice that it fails at 98% with the error message:

FAILED: Unable to find the system volume, reconfiguration is not possible.

image

Solution

The reason why you’re experiencing this error is due to a problem with the boot.ini file.  The situation that I ran into recently was because the server I tried to convert didn’t have a boot.ini file but through some searches on the internet, it looks like most people simply have certain lines in their boot.ini file that are not properly formatted.  The following VMware Communities post is an example: http://communities.vmware.com/thread/218684

After realizing that the server I had problems with was actually missing the file (see the following screenshots):

image

The C:\boot.ini file can not be opened. Operating System and Timeout settings can not be changed.

image

image

image

VMware has a KB article that mentions this type of error:

Best practices for using and troubleshooting Migration Assistant

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017993

I was also able to find a Microsoft KBs that talked about the boot.ini file for a Windows 2003 server:

How to manually edit the Boot.ini file in a Windows Server 2003 environment

http://support.microsoft.com/kb/323427

… but wasn’t able to find another one that explained each parameter in detail.  After doing a bit more searching, I was able to find the following webpage that contained more in depth details about the parameters:

Additional information and help with the boot.ini.

http://www.computerhope.com/issues/ch000492.htm

The parameter I was interested in was the partition because I knew it was the parameter that defined which partition, if there were multiples, was the operating system partition.  In the case of the server I had issues with, my partition was 1.

image

So I took a boot.ini from another server with multiple partitions and changed the partition parameter from 2 because the other server had the OS installed on the second partition to a 1 as such:

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows Server 2003, Standard" /noexecute=optout /fastdetect

image

Once I made created the boot.ini file, the Default operating system: field was no longer blank:

image

I proceeded to reboot the server once to ensure that the server boots fine (and it did) then continued to convert the server and the operation completed successfully.

Unable to vMotion after upgrading ESXi 4.x host to 5.0 with the error message: “The vMotion interface is not configured (or is misconfigured) on the Destination host ‘172.x.x.x’”.

Problem

You’ve just upgraded the majority of your ESXi 4.x hosts to ESXi 5.0 and attempt to vMotion the rest of your virtual machines from the ESXi 4.0 host to the ESXi 5.0 host but receive the following error:

The vMotion interface is not configured (or is misconfigured) on the Destination host ‘172.x.x.x’.

image

Solution

The solution’s extremely simple and I’m not sure if it’s a bug in the upgrade installer or by design (I’m using build 469512) because when I opened up the properties of the vMotion port group, the vMotion checkbox was no longer selected after the upgrade.

image

Simply checking it got vMotion working again (no surprise).

Tuesday, December 13, 2011

Migrating / vMotion virtual machine from ESXi 4.x to 5.0 throws the error: “Virtual machine has 2 virtual CPUs, but the host only supports 1. The number of virtual CPUs may be limited by the guest OS selected for the virtual machine or by the licensing for the host.”

Problem

You’ve just completed the upgrade of 3 out of 4 of your blades from ESXi 4.1 to 5.0 attempt to migrate / vMotion the virtual machines from the ESXi 4.1 blade to the other 3 upgrade ESXi 5.0 blades but receive the following error:

Virtual machine has 2 virtual CPUs, but the host only supports 1. The number of virtual CPUs may be limited by the guest OS selected for the vitual machine or by the licensing for the host.

image

The value for the virtual CPUs may vary but you are unable to vMotion the virtual machine.

Solution

The reason why you’re encountering this error is similar to the following KB which applies to the older versions of ESX and ESXi: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003741

You upgraded ESXi 5.0 hosts were previously licensed under version 4.x and now that it’s upgraded, it’s in a state where it fails the licensing check.  If you navigate into the Manage vSphere Licenses menu, you’ll actually see that your new ESXi 5.0 hosts still references the old vSphere 4 license:

image

I’m not sure if there’s another workaround because I did not get the option of returning the host to evaluation since the evaluation period has already expired so I just went ahead and upgraded the licenses to version 5, assigned it and go vMotion to start working again.

Tool used for burning an ESXi 5.0 ISO to USB for installation

I know I’m bound to forget this one day as I did today when I had to upgrade a few ESXi 4.1 blades to 5.0.  The blades don’t have CD/DVD-ROMs so I had to burn an ISO to my USB stick.  I did this a few weeks ago with a tool that I couldn’t remember so I got on Google to try to find it.  Having no clue what the tool was name, I came across this one as it was the first result on the Google search results:  http://www.isotousb.com/  So I went ahead to burn the VMware-VMvisor-Installer-5.0.0-469512.x86_64.ISO file but noticed that the installer doesn’t book properly.

After searching around to try and retrace my previous steps, I noticed the following icon on my desktop:

unetbootin-windows-563.exe

image

This was when I remembered that this was the tool I used.  I went ahead and burned the ESXi 5.0 installer ISO onto my USB stick and it now boots properly.

Friday, December 9, 2011

Nice change in vSphere 5.0 for changing MTU size

As most of us who worked with vSphere 4 would probably know, changing the MTU size of a vSwitch had to be done via command line with the:

esxcfg-vswitch –m #### vSwitch#

… which looked something like this:

login as: root

Using keyboard-interactive authentication.

Password:

The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools. Please

see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the

vSphere Security documentation for more information.

~ # esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 128 4 128 1500 vmnic2,vmnic3

PortGroup Name VLAN ID Used Ports Uplinks

VM Network 0 0 vmnic2,vmnic3

Management Network 1116 1 vmnic2,vmnic3

~ # esxcfg-vswitch -m 9000 vSwitch0

~ #

~ #

~ #

~ # esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch0 128 4 128 9000 vmnic2,vmnic3

PortGroup Name VLAN ID Used Ports Uplinks

VM Network 0 0 vmnic2,vmnic3

Management Network 1116 1 vmnic2,vmnic3

~ #

image

What’s nice about vSphere 5.0 is that you can now configure the MTU size in the GUI.  See the following screenshots with vSphere 4.x on the left and 5.0 on the right:

image

image

vSphere 5’s vCenter 5.0 now automatically configures the VirtualCenter Server service with “Delayed Start” and SQL dependency

For those who have come across problems mentioned in 2 of my previous blog posts:

vCenter / Virtual Center Service fails to start with event ID: 1000, 7024, 7001, 18456 http://terenceluk.blogspot.com/2010/07/vcenter-virtual-center-service-fails-to.html
Addressing the VirtualCenter service not starting because SQL service hasn’t started yet
http://terenceluk.blogspot.com/2010/07/addressing-virtualcenter-service-not.html

… would be familiar with the issue of collocating SQL server and VMware vCenter on the same server where the vCenter service fails to start because the SQL services hasn’t started yet.  While deploying a few small nodes with vSphere 5.0, I noticed that the vCenter installation process actually puts the SQL service as a dependency for the vCenter service and also sets it as Delayed Start.  This is definitely a nice touch.

image

image

Thursday, December 8, 2011

Publishing a new Lync Server 2010 topology throws the error: “Error accessing share \\yourLyncFileServerName.domain.com\LyncShare - The object does not contain a security descriptor..”

Ran into an interesting problem a few months ago when deploying a new Lync Server 2010 standard pool for a power plant.  The administrator I was working with gave me a Windows 2000 Server to use for the Lync File Store.  I went ahead and created the file share, configured the UNC in the Lync Topology and went ahead to publish it only to receive the following error:

Error accessing share \\yourLyncFileServerName.domain.com\LyncShare - The object does not contain a security descriptor..

image

What’s interesting is that the subfolders in the share does get created but since this pool will eventually become production and will serve over 400 users, I didn’t want to take the risk so while I can’t conclusively say that this error is caused by the version of the Windows Server, I proceeded to ask for a Windows Server 2003 or newer server to store the Lync file store and then used the following TechNet article to move the location:

Move Lync File Store Data to New File Store
http://technet.microsoft.com/en-us/library/gg195742.aspx

Once I moved the Lync File Store location to a Windows Server 2003 server, the error went away.

Opening the Lync Server 2010 Control Panel throws the following prompt: “Cannot find appropriate URL, please input a URL to connect Lync Server Control Panel”

Just wanted to make a quite note about the following prompt that you may receive when you launch the Lync Server 2010 Control Panel:

Cannot find appropriate URL, please input a URL to connect Lync Server Control Panel

image

If you are prompted with this message, you can either use the simple URL such as:  https://lyncadmin.domain.local/cscp

… or the FQDN name such as: https://lyncSTDSRV.domain.local/cscp

I’m not sure about the cause of this prompt but I seem to only experience it whenever I use IE 9 but not IE8.

Certificates issued by root Certificate Authority is missing CRL distribution URL in “CRL Distribution Points” field value

Problem

You’ve just deployed a new enterprise root Certificate Authority in your Active Directory environment to replace an old CA that will be decommissioned.  As you browse through the properties in the Detail tab of a certificate issued by the new CA, you notice that the CRL Distribution Points field’s value appears to be missing the URL to allow clients to download the CRL (Certificate Revocation List).  See the following screenshot that demonstrates this:

image

You proceed to try and manually navigate to the standard URL to download the CRL file and receive the following HTTP Error 404.0 – Not Found page:

image

You proceed and try to the URL for the old CA and you’re able to download the CRL:

image

Solution

The solution is actually quite simple.  Simply open the Certificate Authority administration console, open up the properties of the Certificate Authority:

image

… and navigate to the Extensions tab.  Notice that the following 2 checkboxes are unchecked:

Include in CRLs. Clients use this to find Delta CRL locations.

Include in the CDP extension of issued certificates

image

Simply check the 2 checkboxes and click Apply and then OK:

image

Here is a side by side comparison of the difference after the settings have been applied:

imageimage

Hope this helps anyone who may come across this problem.

Tuesday, December 6, 2011

Enabling a user for Microsoft Lync Server 2010 throws the error: “ConstraintViolationStringDoesNotMatchRegularExpression(Pattern that specifies a valid UserPrincipalName, someUserName)”

I’ve been meaning to blog this when I encountered it a few months ago but never got the chance so when I ran into this again today, I made sure I took some screenshots of it so I can write a post as soon as I got home (and before I have to go out and work again tonight).

Problem

You’re enabling a user for Microsoft Lync Server 2010 but notice that the control panel throws the following error message:

ConstraintViolationStringDoesNotMatchRegularExpression(Pattern that specifies a valid UserPrincipalName, someUserName)

image

Solution

I recall not being able to find anything via searching Google and since the most obvious hint from the error message was the reference to the UserPrincipalName attribute, I went ahead and opened up the problematic account’s object and another object that did not have this problem in Active Directory Users and Computers which showed the following:

image

The above screenshot shows the problematic account’s domain field as blank so I then went ahead and opened ADSIEdit to have a look at the user’s value for that attribute and this was what I saw:

userPrincipalName   JohnB

image

This immediately told me that Lync probably didn’t like the invalid UPN format because as most of us know, UPN is formatted to something similar as such:

username@someDomain.com

Opening up another user account’s object and comparing it side by side shows exactly what’s wrong:

image

To rectify this problem, all we need to do is open up the attribute and correct the format so in this case for JohnB, changing it to JohnB@domain.com allowed us to enable the account for Lync.

So what causes this?  The environment I encountered this issue in was a domain that was around since the NT days and this user account belongs to an employee who’s been around since those days.  My guess is that the upgrades and applications ran against accounts in this domain probably changed the UPN to this format at some point.