Strange vCenter 4.0 U1 and ESXi 4.0 U1 SSL Issue

Last week I came across a problem that really stumped me, it even stumped the Tier-1 and Tier-2 support at VMware.  I’m posting the symptoms on here in a hope that someone else has experienced this issue and can share some light.

How about a little background on the environment, vCenter Server 4.0 U1 and multiple ESX(i) hosts (3.5, 4.0, 4.0 U1).   The vCenter Server as well as a number of ESXi 4.0 hosts were upgraded to U1 a couple days after it was released,  this problem however happened ~8 days after the upgrade.

Symptom 1: All ESX(i) hosts disconnect from vCenter Server, however, they are still online and no VMs went down.  Within 15 minutes all hosts appear to be reconnected.

Symptom 2: After the hosts reconnect, the ESX hosts appear to be functioning normally. However, the ESXi hosts display an error on the Overview tab as well as in the Events tab; “Unable to Synchronize with host that is unavailable.”

Symptom 3: Random VMotions start, for no apparent reason (DRS engaged, yet no constraints causing DRS to be invoked).  However, these VMotions fail at 10% due to the fact that the source and destination host is not available.

Symptom 4: /var/log/messages file displays errors with keywords: [VpxdVmomi] Error getting vpxa info: SSL Exception: Unexpected EOF From hosts, blacklisting showing up.   — I apologize for paraphrasing.

So, all this starts happening and I start investigating….pulling logs, restarting vCenter, and just sit there stumped.  I did notice that the rui.crt on the vCenter server expired, but back in 2008.  I went ahead and renewed the certificate and even restarted the entire vCenter server.  No luck.  I engaged VMware Support and their Tier-1 and Tier-2 support were stumped,  nothing even showed up in their internal database on this issue.

Then it all disappeared.  Roughly 90 minutes after it started, the problem just went away and everything was good.

Have you seen this issue?  What were your troubleshooting steps?  Did you resolve it or figure out the resolution?


Created on December 9, 2009 by Rick Scherer

Posted under vSphere.

This blog has 4,369 views.

Tags: , , , , ,

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

3 Comments so far

  1. RonTom
    7:11 am on December 10th, 2009

    Do you have View Composer installed on your vCenter? Sometimes it seems that the composer ends up in a race with other connections to vCenter. And the web service starts talking jibberish. Did you try reconnecting your vSphere client during the 90 minutes? (That should also have failed with some strange error if it was the composer issue)

  2. Rick Scherer
    8:18 am on December 10th, 2009

    Composer isn’t installed or even used in this environment. The vSphere client was also able to connect at anytime within the 90 minute ‘outage’. I was also able to directly connect to the ESX(i) hosts with no problem.

    The only issue appeared to be between VC and ESX, the SSL errors would appear (as shown above) and the ESX host would blacklist the VC IP address for 3 secs. This would happen over and over again until the entire problem went away.

    I did replace the SSL cert with a new one, but that was done within 15 minutes of the problem appearing.

  3. Garilan
    4:11 pm on January 8th, 2010

    Hi,
    Recently we made the update on a client and had very similar problems.
    Does anyone know the solution to the problem?

Trackbacks

Leave a Comment

Name (required)

Email (required)

Website

Comments

More Blog Post

Next Post: