I had a very fun weekend. It started at 4am Saturday with a migration of ~125 virtual machines from an old AMD based environment to a new Intel Nehalem based environment. Who could’ve known that within a few hours all hell would’ve broken loose.
Enter in problem of network flooding from the NetXen based HP branded NC522SFP. Because all of the 10GbE ports from the (9) new ESXi servers were creating thousands of pause frames on the Cisco Nexus 5020 switches, I thought originally that it was an issue on the switch. Talks with Cisco revealed nothing. We attempted to disconnect one of the connected ports (each ESXi host is dual connected into a pair of N5Ks using vPC) to remove a potential spanning tree loop….no dice.
A reboot of the host resolved the problem, things appeared to be running normally and we decided to let it be and wait until Monday.
10 hours goes by, it is now Sunday morning and the problem returns. First host loses storage (we’re doing NFS over 10GbE here), then two more…until all 9 in this cluster are pretty much toast. I decide to open a ticket with VMware. Wouldn’t you know, there is a potential known bug and resolution.
Description: Some NetXen based 10GbE cards using the unm_nic and nx_nic drivers sometime flood the network with pause frames causing the port to become disabled.
Resolution: NetXen believes upgrading the firmware to version 4.0.516 will resolve the problem.
I’ve gone ahead and patched 4 of the hosts with this new firmware, so far it has been stable (knock on wood). I’ll let you know if something happens.
Checking which version of the firmware you’re running is simple. From a command-line (ESX or ESXi hidden CLI), type ethtool -i <vmnic#> (replace vmnic# with the alias to the vmnic you’d like to check). You should see output similar to:
Update – Utility CD with firmware patch now included…
As you can see above, the firmware is out of date. To update the firmware you will need to boot from a Linux utility CD that has the appropriate driver, you then run a firmware update utility provided by HP. To make this process easy I have created a bootable SLAX utility CD with the drivers pre-loaded. You can download the ISO from here (file temporarily removed). Once booted run the installer located in the root filesystem (ie: ./CP011471.scexe).
Let me know if you have any questions.
Created on January 11, 2010 by Rick Scherer
Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Networking, Storage, vSphere.
This blog has 20,424 views and 40 responses.
Tags: 4.0.516, Bug, Bug 496013, esx, esxi, firmware, NC522SFP, network flood, NetXen, spanning tree