NetXen HP NC522SFP Network Flooding

I had a very fun weekend. It started at 4am Saturday with a migration of ~125 virtual machines from an old AMD based environment to a new Intel Nehalem based environment. Who could’ve known that within a few hours all hell would’ve broken loose.

Enter in problem of network flooding from the NetXen based HP branded NC522SFP.  Because all of the 10GbE ports from the (9) new ESXi servers were creating thousands of pause frames on the Cisco Nexus 5020 switches, I thought originally that it was an issue on the switch.  Talks with Cisco revealed nothing.  We attempted to disconnect one of the connected ports (each ESXi host is dual connected into a pair of N5Ks using vPC) to remove a potential spanning tree loop….no dice.

A reboot of the host resolved the problem, things appeared to be running normally and we decided to let it be and wait until Monday.

10 hours goes by, it is now Sunday morning and the problem returns.  First host loses storage (we’re doing NFS over 10GbE here), then two more…until all 9 in this cluster are pretty much toast.  I decide to open a ticket with VMware.  Wouldn’t you know, there is a potential known bug and resolution.

Bug 496013

Description: Some NetXen based 10GbE cards using the unm_nic and nx_nic drivers sometime flood the network with pause frames causing the port to become disabled.

Resolution: NetXen believes upgrading the firmware to version 4.0.516 will resolve the problem.

I’ve gone ahead and patched 4 of the hosts with this new firmware, so far it has been stable (knock on wood).   I’ll let you know if something happens.

Checking which version of the firmware you’re running is simple. From a command-line (ESX or ESXi hidden CLI), type ethtool -i <vmnic#> (replace vmnic# with the alias to the vmnic you’d like to check).  You should see output similar to:

driver: nx_nic

version: 4.0.301

firmware-version: 4.0.406

bus-info: 0000:07:00.0

Update - Utility CD with firmware patch now included…

As you can see above, the firmware is out of date. To update the firmware you will need to boot from a Linux utility CD that has the appropriate driver, you then run a firmware update utility provided by HP.  To make this process easy I have created a bootable SLAX utility CD with the drivers pre-loaded. You can download the ISO from here (file temporarily removed). Once booted run the installer located in the root filesystem (ie: ./CP011471.scexe).

Let me know if you have any questions.

Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Networking, Storage, vSphere

This post was written by Rick Scherer on January 11, 2010

Tags: , , , , , , , , ,

ESX 4.0 Update 1A

A new patch has been made available for ESX 4.0 Update 1, this is called Update 1A. It only affects ESX and not ESXi. Here is an except of the alert put out by VMware:

ESX 4.0, Update 1, Alert: Upgrading ESX 4.0 to 4.0 U1 can fail or time out and leave the host in an unusable state if using HP Systems Insight Management Agent. ESX 4.0 Update 1a (a re-release of ESX 4.0 Update1) that addresses this issue is available. Please read KB article (ID 1016070) before proceeding with the upgrade.

As I said above, this patch is listed as ESX 4.0 Update 1A and can be found on the VMware Downloads website, or from within VMware Update Manager.

Posted under vSphere

This post was written by Rick Scherer on December 11, 2009

Tags: , , , ,

Strange vCenter 4.0 U1 and ESXi 4.0 U1 SSL Issue

Last week I came across a problem that really stumped me, it even stumped the Tier-1 and Tier-2 support at VMware.  I’m posting the symptoms on here in a hope that someone else has experienced this issue and can share some light.

How about a little background on the environment, vCenter Server 4.0 U1 and multiple ESX(i) hosts (3.5, 4.0, 4.0 U1).   The vCenter Server as well as a number of ESXi 4.0 hosts were upgraded to U1 a couple days after it was released,  this problem however happened ~8 days after the upgrade.

Symptom 1: All ESX(i) hosts disconnect from vCenter Server, however, they are still online and no VMs went down.  Within 15 minutes all hosts appear to be reconnected.

Symptom 2: After the hosts reconnect, the ESX hosts appear to be functioning normally. However, the ESXi hosts display an error on the Overview tab as well as in the Events tab; “Unable to Synchronize with host that is unavailable.”

Symptom 3: Random VMotions start, for no apparent reason (DRS engaged, yet no constraints causing DRS to be invoked).  However, these VMotions fail at 10% due to the fact that the source and destination host is not available.

Symptom 4: /var/log/messages file displays errors with keywords: [VpxdVmomi] Error getting vpxa info: SSL Exception: Unexpected EOF From hosts, blacklisting showing up.   — I apologize for paraphrasing.

So, all this starts happening and I start investigating….pulling logs, restarting vCenter, and just sit there stumped.  I did notice that the rui.crt on the vCenter server expired, but back in 2008.  I went ahead and renewed the certificate and even restarted the entire vCenter server.  No luck.  I engaged VMware Support and their Tier-1 and Tier-2 support were stumped,  nothing even showed up in their internal database on this issue.

Then it all disappeared.  Roughly 90 minutes after it started, the problem just went away and everything was good.

Have you seen this issue?  What were your troubleshooting steps?  Did you resolve it or figure out the resolution?

Posted under vSphere

This post was written by Rick Scherer on December 9, 2009

Tags: , , , , ,

Pre-Order VMware vSphere 4 Administration Instant Reference

VMware vSphere 4 Administration Instant Reference, written by Scott Lowe, Jase McCarty and Matthew Johnson is now available for pre-order on Amazon.com. I was fortunate enough to work on this book as a technical editor, and must say that it is the perfect vSphere quick reference book for both the beginner vSphere admin as well as the seasoned veteran. Be one of the first to get your own copy, and order it now for under $20, it makes the perfect holiday gift!

Posted under Good Reading, Training, vSphere

This post was written by Rick Scherer on December 9, 2009

Tags: , ,

VMware vSphere 4.0 Update 1 Released

This is a couple days past due, but VMware has released Update 1 of their flagship vSphere product line.

One of the key drivers of this update is to provide support for VMware View 4 which was just released for download as well.

It also provides support for Windows 2008 R2 and Windows 7, for both as a Guest Operating System and base O/S for the vSphere Client.  This resolves the freeze issue in both of those O/S as discussed in my previous article.

Another hot update is the full support to utilize the pvSCSI adapter for your boot disk on Windows 2003 and Windows 2008.

Here are a few of the other items from the What’s New section of the Release Notes:

VMware vSphere 4.0 Update 1 ESX Release Notes
What’s New

Enhanced Clustering Support for Microsoft Windows – Microsoft Cluster Server (MSCS) for Windows 2000 and 2003 and Windows Server 2008 Failover Clustering is now supported on an VMware High Availability (HA) and Dynamic Resource Scheduler (DRS) cluster in a limited configuration. HA and DRS functionality can be effectively disabled for individual MSCS virtual machines as opposed to disabling HA and DRS on the entire ESX/ESXi host. Refer to the Setup for Failover Clustering and Microsoft Cluster Service guide for additional configuration guidelines.

Improved vNetwork Distributed Switch Performance Several performance and usability issues have been resolved resulting in the following:

  • Improved performance when making configuration changes to a vNetwork Distributed Switch (vDS) instance when the ESX/ESXi host is under a heavy load
  • Improved performance when adding or removing an ESX/ESXi host to or from a vDS instance

Increase in vCPU per Core Limit The limit on vCPUs per core has been increased from 20 to 25. This change raises the supported limit only. It does not include any additional performance optimizations. Raising the limit allows users more flexibility to configure systems based on specific workloads and to get the most advantage from increasingly faster processors. The achievable number of vCPUs per core depends on the workload and specifics of the hardware. For more information see the Performance Best Practices for VMware vSphere 4.0 guide.

Enablement of Intel Xeon Processor 3400 Series – Support for the Xeon processor 3400 series has been added. For a complete list of supported third party hardware and devices, see the VMware Compatibility Guide.

Resolved Issues In addition, this release delivers a number of bug fixes that have been documented in the Resolved Issues section.

Updating your environment to U1 isn’t that difficult, for vCenter Server simply download the installation files from VMware and install like any previous version.  Ensure that you choose the option to maintain your existing database, or else you’ll lose all of your data.

For updating your ESX hosts, simply use Update Manager or the Host Update Utility to perform these upgrades.

More information on Update 1 can be found here.

Posted under vSphere

This post was written by Rick Scherer on November 25, 2009

Tags: ,

Windows 2008 R2 and Windows 7 Freeze on VMware vSphere 4

Although this issue has been resolved with vSphere 4 U1, for those of you that haven’t upgraded please be aware of a known issue with Windows 2008 R2 and Windows 7 where the guest operating system can freeze for a long period of time.  The issue is noted in VMware KB 1011709.

The resolution is to either upgrade to U1 or to use the standard SVGA driver and not the one provided in the VMware Tools package.

VMware KB 1011709 Excerpt…

To deselect the SVGA drivers installed with VMware Tools:

When you install VMware Tools, select VMware Tools Custom Install and deselect the SVGA driver.

Alternatively, remove the SVGA driver from the Device Manager after installing VMware Tools.

Posted under vSphere

This post was written by Rick Scherer on November 25, 2009

Tags: , , ,

VMware ESXi 4 and HP Servers

I’m proud to say that HP has full support for ESXi 4 installed on SD or USB Flash memory.  They even offer the installable ISO pre-built with their CIM providers, available here. There is a catch though, they only support their approved SD or USB Flash memory….sorry, but you cannot BYOF (Bring Your Own Flash).  This is reflected on their download page:

HP VMware ESXi 4.0 solution requires the following:

  • VMware ESXi 4.0 free downloadable product. To upgrade your license to Enterprise Plus, purchase HP VMware ESXi 4 product 571979-B21.
  • HP ESXi 4 CIM Providers.
  • Registration on VMware ESXi hypervisor web page to obtain permanent license serial number.
  • Acquire your choice of HP supported and qualified media - any HP supported hard drive, USB or SD Flash devices listed below.
  • Please note that all devices will need to be imaged for ESXi 4.0.

    Information about HP supported SD card and USB Flash drive:

    • Supported SD card**:
      HP 4GB SD Flash Media
      HP Part Number 580387-B21
      [spare kit part number 583306-001]
    • Supported USB Flash Drive**:
      HP 4GB USB Flash Media Drive Key
      HP Part Number 580385-B21
      [Spare kit part number 583307-001]
      *Must be purchased separately.
      **HP VMware ESXi 4.0 does not support any other USB or SD flash devices

    Seems simple, doesn’t it?   Actually it works great and I have put a few dozen servers into production with this method, the best part is no more local HDD which saves even more power!

    One question I often get is,  why do you need a 4GB drive, I though ESXi was only 32MB?  Well, even though it is true that ESXi is only 32MB you still need adequate space for the VI Client, VMware Tools (all Operating systems) and upgrade space (for future ESXi patches and releases).  Using a 4GB drive ensures that you’ll have enough space for everything.

    There is one problem currently that I am facing, HP has both the SD Card and USB Flash Drive on back-order and there is no expected ETA for either of them!  This has delayed a major project I’m working on and I’ve had to resort to using temporary “junk” USB drives to get the customer by in the mean time.

    If someone from HP is reading this, PLEASE get your OEM to produce some new ones ASAP!

    Posted under Storage, vSphere

    This post was written by Rick Scherer on October 27, 2009

    Tags: , , , ,

    Before Host Profiles there was vicfg-cfgbackup.pl

    This last weekend I was reminded of a RCLI command that creates a backup of your ESXi Host configuration.  A client had an ESXi host where the USB drive failed.  The system didn’t entirely crash, it pinged so HA didn’t kick-in however hostd and vpxa weren’t responding so management from vCenter was impossible. Performing a reboot of the host proved it was a USB failure when it failed to boot.

    So, HA kicked in finally when the host was powered off - however the remaining hosts in their cluster did not have adequate resources to power everything back on.  We needed to get this failed ESX host back online and quick!

    Brought in new USB stick with fresh ESXi 4.0 installed, plugged in and powered up.  In the TUI configured the Management Port IP address/gateway/DNS, etc.    But now what, all of the (NFS) Datastores, vSwitch configuration and advanced settings were toast…and they didn’t have Enterprise Plus.

    Say Hello to vicfg-cfgbackup.pl — a RCLI command which could be found in the vMA.  This utility will allow you to create a backup and restore a full ESXi configuration.  Luckily we had this (backup) command running on a nightly basis so we knew the configuration was complete.  A simply command and the host was online.

    Seeing that ESX is going to be phased out, this RCLI command is a valid alternative for those ESXi users that do not have Enterprise Plus licensing.

    The commands vicfg-cfgbackup.pl (esxcfg-cfgbackup.pl) allow you to backup and restore the configuration of your ESXi host.

    To backup the host you would run the command.

    vicfg-cfgbackup.pl –server <server_name> -s <backup_file_name>

    To restore your backup configuration to your host you would run. This will cause the host to reboot once the process is complete. NOTE: The host must be in Maintenance Mode for this to work. The backup configuration must also match the patch level of the ESXi install. You can add a -f to force if needed.

    vicfg-cfgbackup.pl –server <server_name> -l <backup_file_name>

    Posted under Backup & Recovery, ESXi 3.5 Tips, vSphere

    This post was written by Rick Scherer on October 12, 2009

    Tags: , , , , ,

    Cisco Nexus 1000V Beta (Upgrade Spoiler!)

    I was just invited by Cisco to be part of the BETA for the next release of the Nexus 1000V.  I’ll provide more details about the BETA after I install and test it out, but here are some teasers;

  • Virtual Service Domains: Supporting Layer 4-7 services such as VMware vShield
  • Security features for virtual desktop: IP Source Guard, Dynamic ARP Inspection, DHCP Snooping
  • VSM VMotion on its own VEM
  • Automated VSM Installer:  Configures VSM, vCenter Server Extension, System Port Groups
  • L3 Connectivity Between VSM and VEM: More flexible deployment
  • XML API
  • And guess what…?!   You can be part of the BETA too! Check out this link and sign up to be part of the distributed virtual switch revolution!

    Posted under Networking, vSphere

    This post was written by Rick Scherer on September 21, 2009

    Tags: , ,

    vCenter Chargeback Uninstallation - Rogue Plug-in

    Doing a proof of concept of VMware vCenter Chargeback, install and usage went great and product does everything you’d expect it to do. Although I’d would’ve liked to seen tighter integration with the vSphere Client, hopefully we can see this in the 1.5 or 2.0 release!

    Well, POC was finished and time to uninstall.  Should be pretty easy, Delete from Disk the Virtual Appliance you installed and delete the database. However, if you forget one crucial step you’ll end up with a rogue plug-in in your vSphere Client!

    Oh No! The vCenter Chargeback plug-in is still there! What do we do?!

    Well, the step you need to do prior to removing the Virtual Appliance is to uncheck the Register As VI Client Plugin box in the vCenter Server settings screen (Settings->vCenter Servers->Edit).  Once you do this, the plug-in will be removed from the vCenter server and you can continue with the removal of the Virtual Appilance and back-end database.

     

     

     

     

    *click on photos to enlarge

    Posted under vCenter, vSphere

    This post was written by Rick Scherer on August 21, 2009

    Tags: ,