NetXen HP NC522SFP Network Flooding

I had a very fun weekend. It started at 4am Saturday with a migration of ~125 virtual machines from an old AMD based environment to a new Intel Nehalem based environment. Who could’ve known that within a few hours all hell would’ve broken loose.

Enter in problem of network flooding from the NetXen based HP branded NC522SFP.  Because all of the 10GbE ports from the (9) new ESXi servers were creating thousands of pause frames on the Cisco Nexus 5020 switches, I thought originally that it was an issue on the switch.  Talks with Cisco revealed nothing.  We attempted to disconnect one of the connected ports (each ESXi host is dual connected into a pair of N5Ks using vPC) to remove a potential spanning tree loop….no dice.

A reboot of the host resolved the problem, things appeared to be running normally and we decided to let it be and wait until Monday.

10 hours goes by, it is now Sunday morning and the problem returns.  First host loses storage (we’re doing NFS over 10GbE here), then two more…until all 9 in this cluster are pretty much toast.  I decide to open a ticket with VMware.  Wouldn’t you know, there is a potential known bug and resolution.

Bug 496013

Description: Some NetXen based 10GbE cards using the unm_nic and nx_nic drivers sometime flood the network with pause frames causing the port to become disabled.

Resolution: NetXen believes upgrading the firmware to version 4.0.516 will resolve the problem.

I’ve gone ahead and patched 4 of the hosts with this new firmware, so far it has been stable (knock on wood).   I’ll let you know if something happens.

Checking which version of the firmware you’re running is simple. From a command-line (ESX or ESXi hidden CLI), type ethtool -i <vmnic#> (replace vmnic# with the alias to the vmnic you’d like to check).  You should see output similar to:

driver: nx_nic

version: 4.0.301

firmware-version: 4.0.406

bus-info: 0000:07:00.0

Update - Utility CD with firmware patch now included…

As you can see above, the firmware is out of date. To update the firmware you will need to boot from a Linux utility CD that has the appropriate driver, you then run a firmware update utility provided by HP.  To make this process easy I have created a bootable SLAX utility CD with the drivers pre-loaded. You can download the ISO from here (file temporarily removed). Once booted run the installer located in the root filesystem (ie: ./CP011471.scexe).

Let me know if you have any questions.

Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Networking, Storage, vSphere

This post was written by Rick Scherer on January 11, 2010

Tags: , , , , , , , , ,

VMware ESXi 4 and HP Servers

I’m proud to say that HP has full support for ESXi 4 installed on SD or USB Flash memory.  They even offer the installable ISO pre-built with their CIM providers, available here. There is a catch though, they only support their approved SD or USB Flash memory….sorry, but you cannot BYOF (Bring Your Own Flash).  This is reflected on their download page:

HP VMware ESXi 4.0 solution requires the following:

  • VMware ESXi 4.0 free downloadable product. To upgrade your license to Enterprise Plus, purchase HP VMware ESXi 4 product 571979-B21.
  • HP ESXi 4 CIM Providers.
  • Registration on VMware ESXi hypervisor web page to obtain permanent license serial number.
  • Acquire your choice of HP supported and qualified media - any HP supported hard drive, USB or SD Flash devices listed below.
  • Please note that all devices will need to be imaged for ESXi 4.0.

    Information about HP supported SD card and USB Flash drive:

    • Supported SD card**:
      HP 4GB SD Flash Media
      HP Part Number 580387-B21
      [spare kit part number 583306-001]
    • Supported USB Flash Drive**:
      HP 4GB USB Flash Media Drive Key
      HP Part Number 580385-B21
      [Spare kit part number 583307-001]
      *Must be purchased separately.
      **HP VMware ESXi 4.0 does not support any other USB or SD flash devices

    Seems simple, doesn’t it?   Actually it works great and I have put a few dozen servers into production with this method, the best part is no more local HDD which saves even more power!

    One question I often get is,  why do you need a 4GB drive, I though ESXi was only 32MB?  Well, even though it is true that ESXi is only 32MB you still need adequate space for the VI Client, VMware Tools (all Operating systems) and upgrade space (for future ESXi patches and releases).  Using a 4GB drive ensures that you’ll have enough space for everything.

    There is one problem currently that I am facing, HP has both the SD Card and USB Flash Drive on back-order and there is no expected ETA for either of them!  This has delayed a major project I’m working on and I’ve had to resort to using temporary “junk” USB drives to get the customer by in the mean time.

    If someone from HP is reading this, PLEASE get your OEM to produce some new ones ASAP!

    Posted under Storage, vSphere

    This post was written by Rick Scherer on October 27, 2009

    Tags: , , , ,

    NetApp SnapManager for VI 2.0 (SMVI)

    For those of you using NetApp for your back-end virtual machine storage, there is a new version of their SnapManager for Virtual Infrastructure (SMVI) tool that was recently released. SMVI 2.0 will include a number of enhancements that really push the bar when it comes to NetApp/VMware integration.

    Some of the enhancements to the 2.0 product include;

    • Autosupport Integration
    • Backup Enhancements & GUI Re-design
    • Snapshot Naming Changes
    • Scripting
    • Restore Enhancements
    • Single File Restore
    • Self-Service Restore
    • Limited Self-Service Restore
    • Administrator-Assisted Restore
    • Restore Agent

    What really excites me with this new version is the ability for an end-user to do a single file restoration, this will dramatically decrease the labor required at the server administration level for these types of requests.  Most of us already using the 1.0 product have seen the benefits of the VMware/NetApp snapshot integration, how NetApp utilizes VMware Tools to quiesce the virtual machine(s) within a datastore then do a NetApp level snapshot. Then there is also the ability to tie this all into SnapMirror, which works great.

    Check out this video demonstrating some of the new features in SMVI 2.0

    For those of you using NetApp, I’d strongly recommend adding SMVI to your FY11 budget!

    Posted under Backup & Recovery, NetApp, Storage

    This post was written by Rick Scherer on September 16, 2009

    Tags: ,

    Cisco UCS Design Flaw? No Northboard FCoE Connectivity

    Today Scott Lowe wrote a post on his blog explaining how Cisco UCS lacks Northbound FCoE connectivity, explained here;

    I’m about halfway through the first day of Unified Computing System (UCS) training in San Jose, CA, and I’ve learned of what I think is a fairly significant limitation. The issue centers around what Cisco refers to as “northbound” traffic and how Fibre Channel over Ethernet (FCoE) is handled with northbound traffic.

    Recall that a central part of UCS is the UCS 6100 series fabric interconnect. The 6100 series fabric interconnect has connectivity in two directions:

    • Southbound connectivity is connectivity aimed back at the fabric extenders in the blade chassis themselves.
    • Northbound connectivity is connectivity headed outside the UCS to other systems and networks.

    All southbound traffic is 10Gbps Ethernet with FCoE. Northbound traffic can be 10Gbps Ethernet or Fibre Channel, but not FCoE. Based on the information I’ve been given (and if I’m incorrect please let me know in the comments), you cannot directly connect an FCoE-enabled storage array to a UCS. Even if your storage array has native FCoE interfaces, you can’t plug them into the UCS 6100 series fabric interconnects because that’s considered northbound traffic and you can’t use FCoE with northbound traffic.

    I have a feeling customers who have purchased storage arrays with FCoE interfaces with the intention of hooking the arrays up directly to a UCS are going to be a bit upset when this information becomes more widely known.

    If I’m working from incorrect or incomplete information, please feel free to speak up in the comments.

    At first I was extremely shocked to hear this, this is pretty big news and I would be upset as a customer if I wasn’t able to directly attach my FCoE-enabled storage array directly to the UCS Fabric Interconnect.

    After doing some research of my own I found the following;

    Read More…

    Posted under Good Reading, Storage

    This post was written by Rick Scherer on July 27, 2009

    Tags: , , ,

    VMware vSphere and the vStorage API

    During my trip to Palo Alto for the VMware vSphere Launch I met with some NetApp engineers at their Executive Briefing Center in Sunnyvale.  This was perhaps one of my favorite meetings while in the Bay Area that week.

    There are going to be a lot of exciting things coming with the GA release of vSphere, one of the biggest in my opinion will be with vStorage API’s, which will allow vSphere to offload tasks to the storage subsystem.

    I was told by NetApp that in vSphere when you initiate a Clone of a Virtual Machine or Deploy from Template, this process will be sent to your FAS system to process, rather than it happening at the ESX host level as it traditionally was done.

    Other things that will be sent to the storage system include De-Duplication (if your storage supports it), commands sent by VMware Data Recovery and also automated provisioning in VMware View can be offloaded.  For VMware View, NetApp provided the example of their Rapid Cloning Utility and how it already integrates with the VI Client - the new release will be a lot more streamlined since it will use vStorage API calls.

    So, what does all of this mean?  It shows that VMware is working close with the hardware vendors to truly build a Virtual Eco-System, and it shows a lot of exciting new potential that vSphere will bring. Pushing storage related processes back to the storage subsystem will not only increase the speed of these transactions, but it will also relieve a lot of CPU and I/O from the ESX hosts.

    Posted under NetApp, Storage, vSphere

    This post was written by Rick Scherer on May 5, 2009

    Tags: , , , ,

    The Things You’ll Find When You Google Yourself

    My boss asked me today to provide a list of things I’ve done the past year for my upcoming performance review.  I typed up all I knew off the top of my head, but I decided to do a Google Search on my name to see what it comes back with.  To my surprise I found a interview that Gartner did of me a few months back in regards to virtualization and storage. What’s funny is that it feels like I did this years ago and that everything I mention is such old and outdated technology! 

    Check it out for yourself, hope you enjoy it!  Here is the abstract and link to view it;

    http://www.itbriefingcenter.com/programs/gartner_644_netapp.html

    DataCenter Virtualization: Why You Need It and Best Practices for Success

    Server virtualization has fast become the way to cut costs. If you are interested in maximizing the benefits of virtualized servers and extending those benefits toward storage, then check out this new program now.

    In this program you will learn how server virtualization:

    • Can save you money
    • Help with server sprawl
    • Free up data center space
    • Support your x86 server architecture
    • Enhance disaster recovery
    • Provide dynamic flexibility
    • Allow for rapid deployment

    You will hear experts from featured analyst firm, Gartner, Inc., and industry experts from NetApp, and San Diego Data Processing Corporation discuss:

    • The reasons your organization should choose server virtualization for storage.
    • The benefits server virtualization can bring to your company.
    • The objectives your company can achieve with server virtualization.

    Plus you’ll have access to a number of case studies, white papers and other supporting materials, which will be interesting and useful to you as you make the move towards virtualization.

    See for yourself. Check out this new program now and learn more.

    Posted under Backup & Recovery, NetApp, Storage

    This post was written by Rick Scherer on April 16, 2009

    Tags: , , ,

    Committing snapshots generates a content ID mismatch error

    I had a big problem Monday AM on one of my core SAP VM instances, that also happens to have a SQL DB server on it. Our VCB process finishes up on late Sunday night, if you’re not aware of how VCB works, it basically creates a snapshot of the Virtual Machine, then mounts the now readable VMDK parent to a proxy server where your backup agent resides. Once the backup is complete the snapshot is committed.  This wasn’t the case Monday AM — the VM crashed and I was paged. Snapshot didn’t commit, parent VMDK could not be found, had to manually set Parent CID in the delta VMDK file then finally when I got it back online the SQL DB was corrupt :( — luckily I had a full SQL backup from the night before.

    This is where VMware KB 1007969 comes into the story…

    Read More…

    Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Storage

    This post was written by Rick Scherer on April 14, 2009

    Tags: , , , , , ,

    Fast Network Throughput in ESX

    Doing some speed tests of IP based storage gave me some good results. First I enabled Jumbo frames on my dedicated IP storage Ethernet card, then I set the MTU on the vSwitch and the VMKnic to 9000 (Scott Lowe has a great write-up on doing this on his website). I then mounted a NFS volume as Datastore and did a copy of a VMDK file which currently resides on a 7200rpm SATA 3 drive to the mounted NFS volume.

    I copied the 8GB file in under 2 minutes and my average MbTX/s was around 525Mb (65MBps) - keep in mind, this was a copy from a local SATA disk (low IOPS) and I was able to get around 65MBps — not bad!

    The biggest thing to remember, VMDK access needs low latency…not high throughput — this is why NFS has become so popular for VMDK storage.

    Posted under Networking, Storage

    This post was written by Rick Scherer on April 7, 2009

    Tags: , , , ,

    Goodbye Backup Agents….Goodbye VCB

    I’ve been quite busy the last couple weeks, so busy that I’ve neglected VMwareTips.com for far too long!  Today I’d like to tell you about an exciting upcoming release for IBM Tivoli Storage Manager (TSM) and NetApp SnapMirror users.

    NetApp announced a couple weeks back the integration of NetApp SnapMirror to Tape and Snapshot technologies with IBM Tivoli Storage Manager (TSM) 6. As a result, NetApp and IBM customers using TSM 6 will experience enhanced backup capabilities for data stored on NetApp FAS storage devices.

    TSM 6 leverages NetApp SnapMirror to Tape technology, enabling customers to use TSM Network Data Management Protocol (NDMP) support to back up FAS volumes directly to tape or across the network to any tape device managed by TSM. This can be done without having to scan each volume to find new and changed files, helping reduce the amount of time customers need to back up their FAS systems, which can make the backup process up to 12 times faster.

    TSM 6 also leverages NetApp Snapshot technology to reduce progressive incremental file backup time by comparing the current backup Snapshot copy with the previous backup Snapshot copy to identify new, changed, and deleted files. Traditional backup scans of large file systems can take hours. However, TSM 6, together with NetApp Snapshot technology, completes the identification of new, changed, and deleted files in seconds, drastically reducing the amount of time customers need to complete NetApp file system backups with TSM.

    Read More…

    Posted under Backup & Recovery, NetApp, Storage

    This post was written by Rick Scherer on February 19, 2009

    Tags: , , , , , ,

    5000 Virtual Desktops using 10GB Storage in Minutes

    Most people have never heard about NetApp’s Rapid Cloning Utility (RCU) for VMware. This tool is the magic behind the famous YouTube video that show how NetApp can very quickly create and deploy a very large number of Virtual Desktops, while using very little disk space on the filer. With desktop virtualization being hot right now this is a great tool to get to know and use. The tool is available on the NOW tool chest (a free website for NetApp customers and partners).

    In this video below, NetApp demonstrates the cost savings of their zero disk cost provisioning and data deduplication software. 5000 Virtual Desktops provisioned in minutes, while only taking a little over 10GB storage.

    Read More…

    Posted under ESX 3.5 Tips, ESXi 3.5 Tips, Good Reading, NetApp, Storage, VMware

    This post was written by Rick Scherer on February 5, 2009

    Tags: , , , , , ,