NetXen HP NC522SFP Network Flooding

I had a very fun weekend. It started at 4am Saturday with a migration of ~125 virtual machines from an old AMD based environment to a new Intel Nehalem based environment. Who could’ve known that within a few hours all hell would’ve broken loose.

Enter in problem of network flooding from the NetXen based HP branded NC522SFP.  Because all of the 10GbE ports from the (9) new ESXi servers were creating thousands of pause frames on the Cisco Nexus 5020 switches, I thought originally that it was an issue on the switch.  Talks with Cisco revealed nothing.  We attempted to disconnect one of the connected ports (each ESXi host is dual connected into a pair of N5Ks using vPC) to remove a potential spanning tree loop….no dice.

A reboot of the host resolved the problem, things appeared to be running normally and we decided to let it be and wait until Monday.

10 hours goes by, it is now Sunday morning and the problem returns.  First host loses storage (we’re doing NFS over 10GbE here), then two more…until all 9 in this cluster are pretty much toast.  I decide to open a ticket with VMware.  Wouldn’t you know, there is a potential known bug and resolution.

Bug 496013

Description: Some NetXen based 10GbE cards using the unm_nic and nx_nic drivers sometime flood the network with pause frames causing the port to become disabled.

Resolution: NetXen believes upgrading the firmware to version 4.0.516 will resolve the problem.

I’ve gone ahead and patched 4 of the hosts with this new firmware, so far it has been stable (knock on wood).   I’ll let you know if something happens.

Checking which version of the firmware you’re running is simple. From a command-line (ESX or ESXi hidden CLI), type ethtool -i <vmnic#> (replace vmnic# with the alias to the vmnic you’d like to check).  You should see output similar to:

driver: nx_nic

version: 4.0.301

firmware-version: 4.0.406

bus-info: 0000:07:00.0

Update - Utility CD with firmware patch now included…

As you can see above, the firmware is out of date. To update the firmware you will need to boot from a Linux utility CD that has the appropriate driver, you then run a firmware update utility provided by HP.  To make this process easy I have created a bootable SLAX utility CD with the drivers pre-loaded. You can download the ISO from here (file temporarily removed). Once booted run the installer located in the root filesystem (ie: ./CP011471.scexe).

Let me know if you have any questions.


Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Networking, Storage, vSphere, this blog has 2,003 views and 13 responses.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

13 Comments so far

  1. Dimitri Rakviashvili
    3:35 am on February 12th, 2010

    Rick, it seems, that your advise is only 100 % right on the net. Today is 12’s day, after firmware upgrade, and everyting works fine.

    Thank you very much !

    Here is my config and method that worked for me (for everybody, who experineces same problem)

    HP Proliant ML 370 G6
    HP NC375i Integrated Quad Port Multifunction Gigabit Server Adapter (NetXen chip, 4 1GB ports)
    ESX (vSphere) 4 Update Pack 1
    nx_nic driver version : 4.0.301
    Nic fimrware version : 4.0.406

    Problem : exactly the same, as Rick wrote …

    Solution :

    NIC’s firmware upgrade to version : 4.0.516
    nx_nic driver version 4.0.301 works fine with this firmware (not tested with nx_nic 4.0.407, that is available)

    Method : Could not boot this server from SLAX, so used Windows instead. We have moved nic card to another machine with Windows Server 2008 x64 installed and used

    sp45328 update, available from HP.

    ESX BOOTS FINE after this and ethtool shows updated firmware version(4.0.516)

    One more thing. This adapter is fully programmable, ant its firmware can be loaded from OS, during driver/module init.

    But this is not used in ESX ! ESX uses NIC’s firmware(flashed in eeprom), exactly as Rick wrote.

    By the way Windows OS uses software image loading, so flashing under windows does not change anything, OS still uses firmware version, that is integrated in driver ! Driver update is only solution to use new firmware version under Windows. As i know, Suse’s version of nx_nic also uses this feauture.

    Hope this helps somebody :)

    p.s

    Sorry everybody, english is not my native language :)

  2. mole
    8:32 am on February 9th, 2010

    Hi,

    We have the same issue…

    I am struggling to update the firmware on these cards. I have ESX 3.5 and have tried running the slax ISO you provided…

    I get as far as “Freeing unused kernel memory: 616k freed” “scsi 2:0:0:0: CD-ROM HP Virtual DVD-ROM….” but no further in booting. Any ideas?

  3. Michael Hutchesson
    3:44 am on February 9th, 2010

    Im using ESX 3.5 update 5 and it doesnt seem to see the NC522SFP+ at all, I have installed the driver from the vmware site but when I try to update the firmware I get the same error Please install nx_nic rpm.Need nx_xport driver to continue,

    Does anyone know if ESX should see the card to start with (I thought driver was inbox as they say)

    Also the SLAX cdrom doesnt boot in a HP DL380G6 either thanks.

  4. Tim Valentine
    6:21 pm on February 3rd, 2010

    Rick, I am googling: nexus pause frames incrementing
    trying to learn more about the issue we have, but the only links of any value all seem to be coming back to your postings….:)
    anyway thanks for posting this and gluck

    tv

  5. Rick Scherer
    2:54 pm on January 27th, 2010

    Dimitri,

    ethtool shows the loaded/running version of the driver and firmware. Booting the SLAX tool and updating the firmware will flash the eeprom on the device itself and it will be the firmware used at next boot. The driver (4.0.301 provided in ESX 4+) is the default, you can upgrade this to 4.0.407 from the driver available on vmware.com - however, there is a chance of a PSOD when mixing firmware 4.0.516 and driver 4.0.404 or 407 — VMware is aware of this and the HW vendors are working on the issue.

    My suggestion, which I’ve tested and works for me is the combination of Firmware 4.0.516 and driver 4.0.301. Do not update the driver to 4.0.40x from the VMware website.

    Still no guarantees, lots of people have been pulling out their NetXen based cards and putting in Intel.

  6. Dimitri Rakviashvili
    8:06 am on January 26th, 2010

    Rick, are you sure that updating firmware physically on the NIC board makes nx_nic driver use this firmware ? as i have seen in following explanation, card uses host-based firmware load, so firmware, included in the driver package is used :

    Important Information
    ======================
    The nx_nic driver uses host-based firmware load. When the user loads the
    driver the host-based firmware image in the driver will be the version that
    is displayed when checking the firmware version using ethtool. The burned-in
    firmware on the device as well as the host-based firmware version can be
    verified by checking dmesg for entries similiar to the following:

    nx_nic: Flash Version: Firmware[4.0.516], BIOS[2.1.3]

    nx_nic: File FW[nx3fwmn.bin] version[4.0.516:12818]

    Copyright 2009 Hewlett-Packard Development Company, L.P.
    taken from : CP011471.txt (flash update package)

    Another Question is : does ethtool show “loaded” version or “flashed” version ?

    P.S. no nx_nic string was found in dmesg after
    esxcfg-module -u nx_nic
    esxcfg-module nx_nic

    (module reload that solves problem without restart of the system)
    Can not check if boards’s firmware version differs from driver’s one

    p.p.s problem

    my card is HP NC375i and the problem is the same. it is rebranded NetXen by HP and is fully affected by this issue.

    nx_nic was previusly udated to 407 version(VMWare proveded package), but problem persists. FIrmware version is 406 (showed by ethtool)

  7. JudB
    12:49 pm on January 25th, 2010

    FYI, the SLAX CD doesn’t boot on the DL785 servers, HP suggests using the Firmware Maintenance CD ROM or USB Key (http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=MTX-UNITY-I23839) and ( http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=3974962&prodNameId=3974971&swEnvOID=4004&swLang=13&mode=2&taskId=135&swItem=MTX-cdee9b93a98c43bb95653f9d41 )

    to install the firmware update.

    However I had an issue where vmware would not recognize the NC522 card after updating the firmware. HP support informed me this is a known problem and that they have no eta on a fix.

  8. Rick Scherer
    12:44 pm on January 25th, 2010

    Sorry guys for the miscommunication. I’ve included the bootable SLAX CD that includes the firmware update. Download it, boot from it and install.

  9. Dimitri Rakviashvili
    2:19 pm on January 23rd, 2010

    Than you for the tip. Tried to update firmware using CP011471.scexe :
    sh CP011471.scexe (under root) on ESX 4.0 Update Pack 1. Have got following error :

    Please install nx_nic rpm.Need nx_xport driver to continue.

  10. JudB
    9:18 am on January 21st, 2010

    I’m having this same problem with a nexus 5010 and the NC522SFP cards.. however the DL785 G6 I have them in will not boot the SLAX cd, it hangs every time. I’ve swapped the cards into another system to flash them, but now vmware doesn’t recognize them at all when i put them back.

    I have driver 4.0.404 loaded and used the firmware you linked to off HPs site.

    Another thing I noticed is that ethtool wont allow changing any settings on these nics.. the qlogic docs don’t really talk about any options with vmware as far as i can find either. :(

    Now I kind of wish I bought the intel 10GB AF DA cards instead.

  11. Cham
    10:20 pm on January 15th, 2010

    I head a very similar problem with the NC510F 10GbE cards ( NetXen before they were bought out- rebranded by HP) three years ago on Linux, massive pause frame generation on NC510F 10GbE cards, it was a huge issue which took a while to resolve with them. Interesting the problem is coming up again on a different model.

  12. Rick Scherer
    12:14 am on January 12th, 2010

    You can really go both ways on that. I’m a little bit of the opposite. I’m skeptical that updates may impose expected results and should be thoroughly tested before going into production. This is typically something that cannot be reasonably done.

  13. Jason Boche
    7:01 pm on January 11th, 2010

    Now THAT sounds like fun. Thanks for the heads up. Somewhere, my old manager is at his pulpit raging about how important it is to always be on the newest version of software, firmware, etc. no matter what the cost. He believes this is cause of 95%+ of all technical problems. He may be right but there is only so much time in a day, or in our case, an evening or weekend.

Leave a Comment

Name (required)

Email (required)

Website

Comments

More Blog Post