NetXen HP NC522SFP Network Flooding

I had a very fun weekend. It started at 4am Saturday with a migration of ~125 virtual machines from an old AMD based environment to a new Intel Nehalem based environment. Who could’ve known that within a few hours all hell would’ve broken loose.

Enter in problem of network flooding from the NetXen based HP branded NC522SFP.  Because all of the 10GbE ports from the (9) new ESXi servers were creating thousands of pause frames on the Cisco Nexus 5020 switches, I thought originally that it was an issue on the switch.  Talks with Cisco revealed nothing.  We attempted to disconnect one of the connected ports (each ESXi host is dual connected into a pair of N5Ks using vPC) to remove a potential spanning tree loop….no dice.

A reboot of the host resolved the problem, things appeared to be running normally and we decided to let it be and wait until Monday.

10 hours goes by, it is now Sunday morning and the problem returns.  First host loses storage (we’re doing NFS over 10GbE here), then two more…until all 9 in this cluster are pretty much toast.  I decide to open a ticket with VMware.  Wouldn’t you know, there is a potential known bug and resolution.

Bug 496013

Description: Some NetXen based 10GbE cards using the unm_nic and nx_nic drivers sometime flood the network with pause frames causing the port to become disabled.

Resolution: NetXen believes upgrading the firmware to version 4.0.516 will resolve the problem.

I’ve gone ahead and patched 4 of the hosts with this new firmware, so far it has been stable (knock on wood).   I’ll let you know if something happens.

Checking which version of the firmware you’re running is simple. From a command-line (ESX or ESXi hidden CLI), type ethtool -i <vmnic#> (replace vmnic# with the alias to the vmnic you’d like to check).  You should see output similar to:

driver: nx_nic

version: 4.0.301

firmware-version: 4.0.406

bus-info: 0000:07:00.0

Update – Utility CD with firmware patch now included…

As you can see above, the firmware is out of date. To update the firmware you will need to boot from a Linux utility CD that has the appropriate driver, you then run a firmware update utility provided by HP.  To make this process easy I have created a bootable SLAX utility CD with the drivers pre-loaded. You can download the ISO from here (file temporarily removed). Once booted run the installer located in the root filesystem (ie: ./CP011471.scexe).

Let me know if you have any questions.


Created on January 11, 2010 by Rick Scherer

Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Networking, Storage, vSphere.

This blog has 15,919 views and 40 responses.

Tags: , , , , , , , , ,

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.00 out of 5)
Loading ... Loading ...

39 Comments so far

  1. Jason Boche
    7:01 pm on January 11th, 2010

    Now THAT sounds like fun. Thanks for the heads up. Somewhere, my old manager is at his pulpit raging about how important it is to always be on the newest version of software, firmware, etc. no matter what the cost. He believes this is cause of 95%+ of all technical problems. He may be right but there is only so much time in a day, or in our case, an evening or weekend.

  2. Rick Scherer
    12:14 am on January 12th, 2010

    You can really go both ways on that. I’m a little bit of the opposite. I’m skeptical that updates may impose expected results and should be thoroughly tested before going into production. This is typically something that cannot be reasonably done.

  3. Cham
    10:20 pm on January 15th, 2010

    I head a very similar problem with the NC510F 10GbE cards ( NetXen before they were bought out- rebranded by HP) three years ago on Linux, massive pause frame generation on NC510F 10GbE cards, it was a huge issue which took a while to resolve with them. Interesting the problem is coming up again on a different model.

  4. JudB
    9:18 am on January 21st, 2010

    I’m having this same problem with a nexus 5010 and the NC522SFP cards.. however the DL785 G6 I have them in will not boot the SLAX cd, it hangs every time. I’ve swapped the cards into another system to flash them, but now vmware doesn’t recognize them at all when i put them back.

    I have driver 4.0.404 loaded and used the firmware you linked to off HPs site.

    Another thing I noticed is that ethtool wont allow changing any settings on these nics.. the qlogic docs don’t really talk about any options with vmware as far as i can find either. :(

    Now I kind of wish I bought the intel 10GB AF DA cards instead.

  5. Dimitri Rakviashvili
    2:19 pm on January 23rd, 2010

    Than you for the tip. Tried to update firmware using CP011471.scexe :
    sh CP011471.scexe (under root) on ESX 4.0 Update Pack 1. Have got following error :

    Please install nx_nic rpm.Need nx_xport driver to continue.

  6. Rick Scherer
    12:44 pm on January 25th, 2010

    Sorry guys for the miscommunication. I’ve included the bootable SLAX CD that includes the firmware update. Download it, boot from it and install.

  7. JudB
    12:49 pm on January 25th, 2010

    FYI, the SLAX CD doesn’t boot on the DL785 servers, HP suggests using the Firmware Maintenance CD ROM or USB Key (http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=MTX-UNITY-I23839) and ( http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=3974962&prodNameId=3974971&swEnvOID=4004&swLang=13&mode=2&taskId=135&swItem=MTX-cdee9b93a98c43bb95653f9d41 )

    to install the firmware update.

    However I had an issue where vmware would not recognize the NC522 card after updating the firmware. HP support informed me this is a known problem and that they have no eta on a fix.

  8. Dimitri Rakviashvili
    8:06 am on January 26th, 2010

    Rick, are you sure that updating firmware physically on the NIC board makes nx_nic driver use this firmware ? as i have seen in following explanation, card uses host-based firmware load, so firmware, included in the driver package is used :

    Important Information
    ======================
    The nx_nic driver uses host-based firmware load. When the user loads the
    driver the host-based firmware image in the driver will be the version that
    is displayed when checking the firmware version using ethtool. The burned-in
    firmware on the device as well as the host-based firmware version can be
    verified by checking dmesg for entries similiar to the following:

    nx_nic: Flash Version: Firmware[4.0.516], BIOS[2.1.3]

    nx_nic: File FW[nx3fwmn.bin] version[4.0.516:12818]

    Copyright 2009 Hewlett-Packard Development Company, L.P.
    taken from : CP011471.txt (flash update package)

    Another Question is : does ethtool show “loaded” version or “flashed” version ?

    P.S. no nx_nic string was found in dmesg after
    esxcfg-module -u nx_nic
    esxcfg-module nx_nic

    (module reload that solves problem without restart of the system)
    Can not check if boards’s firmware version differs from driver’s one

    p.p.s problem

    my card is HP NC375i and the problem is the same. it is rebranded NetXen by HP and is fully affected by this issue.

    nx_nic was previusly udated to 407 version(VMWare proveded package), but problem persists. FIrmware version is 406 (showed by ethtool)

  9. Rick Scherer
    2:54 pm on January 27th, 2010

    Dimitri,

    ethtool shows the loaded/running version of the driver and firmware. Booting the SLAX tool and updating the firmware will flash the eeprom on the device itself and it will be the firmware used at next boot. The driver (4.0.301 provided in ESX 4+) is the default, you can upgrade this to 4.0.407 from the driver available on vmware.com – however, there is a chance of a PSOD when mixing firmware 4.0.516 and driver 4.0.404 or 407 — VMware is aware of this and the HW vendors are working on the issue.

    My suggestion, which I’ve tested and works for me is the combination of Firmware 4.0.516 and driver 4.0.301. Do not update the driver to 4.0.40x from the VMware website.

    Still no guarantees, lots of people have been pulling out their NetXen based cards and putting in Intel.

  10. Tim Valentine
    6:21 pm on February 3rd, 2010

    Rick, I am googling: nexus pause frames incrementing
    trying to learn more about the issue we have, but the only links of any value all seem to be coming back to your postings….:)
    anyway thanks for posting this and gluck

    tv

  11. Michael Hutchesson
    3:44 am on February 9th, 2010

    Im using ESX 3.5 update 5 and it doesnt seem to see the NC522SFP+ at all, I have installed the driver from the vmware site but when I try to update the firmware I get the same error Please install nx_nic rpm.Need nx_xport driver to continue,

    Does anyone know if ESX should see the card to start with (I thought driver was inbox as they say)

    Also the SLAX cdrom doesnt boot in a HP DL380G6 either thanks.

  12. mole
    8:32 am on February 9th, 2010

    Hi,

    We have the same issue…

    I am struggling to update the firmware on these cards. I have ESX 3.5 and have tried running the slax ISO you provided…

    I get as far as “Freeing unused kernel memory: 616k freed” “scsi 2:0:0:0: CD-ROM HP Virtual DVD-ROM….” but no further in booting. Any ideas?

  13. Dimitri Rakviashvili
    3:35 am on February 12th, 2010

    Rick, it seems, that your advise is only 100 % right on the net. Today is 12′s day, after firmware upgrade, and everyting works fine.

    Thank you very much !

    Here is my config and method that worked for me (for everybody, who experineces same problem)

    HP Proliant ML 370 G6
    HP NC375i Integrated Quad Port Multifunction Gigabit Server Adapter (NetXen chip, 4 1GB ports)
    ESX (vSphere) 4 Update Pack 1
    nx_nic driver version : 4.0.301
    Nic fimrware version : 4.0.406

    Problem : exactly the same, as Rick wrote …

    Solution :

    NIC’s firmware upgrade to version : 4.0.516
    nx_nic driver version 4.0.301 works fine with this firmware (not tested with nx_nic 4.0.407, that is available)

    Method : Could not boot this server from SLAX, so used Windows instead. We have moved nic card to another machine with Windows Server 2008 x64 installed and used

    sp45328 update, available from HP.

    ESX BOOTS FINE after this and ethtool shows updated firmware version(4.0.516)

    One more thing. This adapter is fully programmable, ant its firmware can be loaded from OS, during driver/module init.

    But this is not used in ESX ! ESX uses NIC’s firmware(flashed in eeprom), exactly as Rick wrote.

    By the way Windows OS uses software image loading, so flashing under windows does not change anything, OS still uses firmware version, that is integrated in driver ! Driver update is only solution to use new firmware version under Windows. As i know, Suse’s version of nx_nic also uses this feauture.

    Hope this helps somebody :)

    p.s

    Sorry everybody, english is not my native language :)

  14. James Shelton
    9:03 am on March 11th, 2010

    We are fighting issues with these cards as well. Here’s what I can tell you:

    We were running 3.5 U4 hosts running on DL585 G6′s with these NC522SFP cards. We tried everything to get the updated firmware on these cards, and here’s how we finally did it…

    1.) You can download the current SLAX live CD from Qlogic’s website (this disc contains the .517 code on it)

    2.) When the SLAX boot splash screen comes up…hit the Tab key to enter in custom options…

    3.) Append the boot line with the following: mem=16GB maxcpus=1

    4.) Follow the on screen prompts to upgrade the flash firmware of the NC522SFP card/s

    Nice and easy after that point. Fair warning though…the RAM load takes a LONGGGG time…so be prepared to wait quite a while. (We did this work via ILO, so direct from physical media may very well be much faster)

    Problem is though…if you have applied the driver CD found at http://downloads.vmware.com/d/details/esx_35_unmnic404_qla_dt/dGViZGoqJWJkZXBo then you will not see any change in behavior or firmware load. This driver will continue to load the file version of the firmware (.404) instead of the now updated version (.517) for some reason. I think this is a bug and I have an open case with VMware right now for investigation. We ended up having to roll back to .275 inbox drivers to gett the updated firmware to properly load and report with ethtool. So far the environment has been stable with this configuration…but time will tell…

  15. Enrico
    11:23 am on May 13th, 2010

    After upgrading the firmware to the version .516 we are still experiencing the pause frame issue but at least we get rid of the overheating problem of the NIC when mounted on DL380G6.
    To get rid of pause frame HP advise now to update to .520. To update the firmware I made a diskless CentOS CD equipped with HP nx_nic module and nx_nic_tools and it works.

    Real problem anyway is the “SIDE EFFECT” of the firmware update (either .516 or .520)
    In fact “Side effect” of the firmware update seems to be that VM to VM running on two different ESXi hosts max speed with iperf will be throttled down to 1.6Gbps.
    Apparently the combination of the new firmware and the ESXi’s nx_nic module is not very good.
    We are forced to use the inbox driver because the async (=provided by Qlogic/HP) driver downloadable from the vmware web site will cause a kernel panic in ESXi.
    Anyone experiencing the same issue?

  16. Enrico
    11:40 am on May 13th, 2010

    The firmware update to version .516 did not fix the problem for us but at least we get rid of the overheating problem on the NIC when mounted on a DL380G6.
    For the pause frame issue HP is advising us, now, to update the firmware to the version .520.
    I managed to update the firmware creating a CentOS live CD iso with the HP’s nx_nic module and the nx_nic_tools (all of them downloadable from the HP web site).
    Real issue now is the “SIDE EFFECT” of the firmware update (either 516 or 520)
    In fact, apparently after the firmware update, VM to VM our network performance is throttled down to 1.6 Gbps (VMs obviously running on two different ESXi hosts).
    So probably the new firmware and the “old” inbox nx_nic module of ESXi are not a good match!
    Trying to update the nx_nic module though with the async (=provided by Qlogic/HP) module downloadable from VMWARE’s web site will cause a kernel panic (both of them!)

    Anyone else experiencing/noticed same issue?

  17. Sean
    10:50 am on May 20th, 2010

    I just hit the exact same issue but a different NetXen card. This one is a NetXen HP NC375i Quadport Gigabit card running in a Standalone ESXi 4.0 server and our switch is not even managed so it won’t respond to these flow control packets but the PIX firewall must have been because these PAUSE frames were taking down their internet connection and killing all connectivity to the ESXi Console IP (couldn’t even ping the ESXi server much less the VMs on it). I just used the HP Firmware DVD 9.0 and flashed the card up to 4.0.501. Will monitor to see if this fixed it but odds are it did as an issue this similar must be the same issue. Anyway, thanks a million for this post. Eventually I probably would have flashed all the firmware but never would have thought an Ethernet Flow Control flood would have been due to a card issue and not an ESXi or some other bug without this post. You saved my untold hours of troubleshooting.

  18. Sean
    10:56 am on May 20th, 2010

    Oh… I should have read more of the comments as I see another guy with the same ML370 G6 and NC375i posted already.

    But guys you are killing yourselves trying to update the firwmare with SLAX or such tools. Use the HP Firmware DVD:
    http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=18964&prodSeriesId=1844067&prodNameId=1844068&swEnvOID=4004&swLang=13&mode=2&taskId=135&swItem=MTX-a69a789f6bfc4b0fbe2d84c7c6

    Just google for it. It works with nearly all Proliant servers and blades. I think only the really cheap 100 series servers are not supported.

  19. Michael Hutchesson
    9:44 pm on May 25th, 2010

    Enrico, are you aware of the HP notice about having to install the NC522SFP+ in a particular slot. Document ID c01980870

    I have it installed in slot 4 and I selected increased cooling as well to be safe

    Michael.

  20. Michael Hutchesson
    3:16 am on May 26th, 2010

    Sorry Enrico, more info to clear up my last message, the overheating is in HP DL380 G6 models

    We are using firmware 520 and inbox ESX4 update 1 driver and like you are are only getting 1.6G UDP between VM’s on different hosts – very poor I thought. Also I can confirm thaty the async driver gives the purple screen of death on ESX 4 Update 1

    thanks
    michael.

  21. VE
    11:15 am on May 26th, 2010

    Hello all,
    I work with VMware Engineering department. There are many misconceptions in the original post and some of the comments regarding the driver. The network flooding issue has been resolved as of now.

  22. Andy Daniel
    3:03 pm on May 26th, 2010

    I’ve been through all kinds of issues with these NICs. Pre 4.0.505 firmware doesn’t even work with some versions of HP branded direct attach cables. I discovered the PSOD with the 4.0.407 driver a while back. I’m anxiously awaiting a new driver with Netqueue support. The inbox driver (4.0.301) doesn’t include Netqueue support and this is likely resulting in the speed issue that Enrico is seeing. I’ve been recommending the Intel ET and Emulex OneConnect (ServerEngines) adapters lately because of these issues.

  23. Andy Daniel
    11:58 am on May 27th, 2010

    @VE Can you elaborate further on the issue and what “as of now” means? Specifically which firmware and driver versions? Thanks.

  24. Michael Hutchesson
    3:55 pm on May 27th, 2010

    Hi VE,

    All I can say is I can confirm the following

    NC522SFP+ didnt work with HP supplied cables – but did with Cisco
    Async drivers causes PSOD
    Inbox driver is really slow 1.6G Max
    (using firmware 520)
    How can we work with VMWare to address this performance issue?

    Michael.

  25. John
    1:15 am on May 30th, 2010

    I believe I had the flooding issue tonight. ESX 4.0 U1 with nc522sfp. Running firmware 520. Networking saw massive flooding of UDP 8808 packets (which as far as I can tell are pause frames). I too concur that that 407 driver causes purple screen, I tried to install it a few weeks ago.

    Can that VMware engineer who posted a few days ago please shed some light on this. I opened a case, but it sounds like he knows. Please contact me if you can!

  26. Andy Daniel
    5:47 am on June 3rd, 2010

    HP has released 4.0.526 firmware for these adapters. I’ve flashed, but driver 4.0.407 still results in a PSOD. For others flashing, this update does not work with the HP Firmware DVD. For me, the easiest way to flash was to steal the “phantom_romimage” file from the Windows version of the update and drop it in /root of the QLogic Slax 4.0.517 update disc.

  27. Richard
    12:33 pm on June 3rd, 2010

    Apparently the 520 firmware has turned off the Flow Control feature (one way to stop the flooding). Have a ticket open with HP to see when that will be fixed (properly)and reenabled.

    Just wondering what other people are doing in the meantime? Are you running the inbox drivers with flowcontrol off or another approach?

  28. VE
    1:39 pm on June 17th, 2010

    Apologies for the late reply. There was no driver change, the pause frame was addressed by a recent firmware release by NetXen. We are aware of the PSOD upon loading the driver and working on a fix which should be released soon.

  29. Andy Daniel
    6:51 am on July 4th, 2010

    Looks like we finally have 4.0.560 from VMware. I’d imagine this has the firmware built into the driver will negate having to flash. I’ll test and report back ASAP.

  30. Matthias Branzko
    12:27 am on August 10th, 2010

    Hi Andy,

    any news regarding your tests?
    We’ve a call open at VMware and they confirmed a big issue with ESX 4.1 and the NetXen HP NC522SFP cards (ESX4.1 out of box will not start with these cards -> PSOD).
    They told us that the guys at HP “are looking at qualifying this driver for 4.1.
    We also looking for a fix on the inbox driver.”

  31. Andy Daniel
    10:36 am on August 16th, 2010

    Matthias,

    I have tested, but not with the 4.1 Inbox driver. The 4.0.560 Async driver works with the cards in 4.1, although you’ll have to find other means to install it. Esxupdate under 4.1 will not install the 4.0.560 driver though. Shoot me an email if you need help to get it installed by other means. FYI, I recently finished 10GbE testing of a myriad of NICs from multiple manufacturers and will publish a whitepaper soon. These NetXen based NICs were the worst performers.

    Andy

  32. Matthias Branzko
    2:24 am on August 24th, 2010

    Andy,

    thanks for the answer. Meanwhile I already got the 4.0.560 driver working with ESX 4.1.
    I just used vihostupdate. The procedure was:

    - deactivated the NIC in BIOS
    - installed and configured ESX 4.1
    - used vihostupdate to install “ntx-nx_nic-4.0.560-164009-offline_bundle-266514.zip”
    - activated the NIC in BIOS and ESX 4.1 came up without a PSOD

    The point I do know know is: how stable is that configuration? Unfortunately I did not have enough time to do some “stress tests” yet.

    But I am going to replace the NC5322 with Intel X520 before going in production with these servers.

    Anyway I’m very interested in reading your whitepaper if released :)

    Matthias

  33. Richard
    5:19 am on September 7th, 2010

    Andy,

    I look forward to the whitepaper also.

    We have since replaced out Netxen/HP cards with Intel’s and have experienced no issues.

    Cheers

    Richard

  34. michael hutchesson
    8:27 pm on September 11th, 2010

    Hi richard,

    Exactly what intel card are you using (model etc) I have issues finding these in australia

    Michael

  35. Matthias Branzko
    1:34 am on September 14th, 2010

    Hi Michael,

    we are using Intel NICs out of the X520 series.
    Have a look:
    http://www.intel.com/support/network/adapter/x520server/sb/CS-030628.htm

    In ESX 4.1 a driver is already inbox – but there are also async drivers available with enabled NetQueue (recommended).

    Matthias

  36. Richard
    4:54 am on September 14th, 2010

  37. JudB
    6:20 am on September 14th, 2010

    We are using the intel X520-DA cards after returning the HP NC522SFP cards. I spoke with the qlogic people at vmworld regarding these issues and the level of support that qlogic (the owners of NetXen now) and HP have provided. Hopefully they’ll do something about it.

  38. John
    10:48 am on September 22nd, 2010

    I have been running the Intel x520 SR1 card in production now since June and have had no issues. One other thing I did was to disable VMDQ on the nics by running the following:

    esxcfg-module -s “InterruptType=0,0 VMDQ=0,0 MQ=0,0 RSS=0,0″ ixgbe

    I had read that there were some issues with this card if you had vmotion on the nic.

  39. cs go beta crashes
    10:11 am on December 21st, 2011

    As I site possessor I believe the content material here is rattling great , appreciate it for your hard work. You should keep it up forever! Good Luck.

Trackbacks

Leave a Comment

Name (required)

Email (required)

Website

Comments

More Blog Post