My HDD Failed – But ddrescue Saved the Day

On Tuesday morning my HDD failed on my PC at the office.  Most of my important data is in a NFS volume on a FAS6030, but my PC still had some important stuff on it.  Including a Windows XP virtual machine that has all my necessary programs (Outlook, SAP, TweetDeck) and some files that I forgot to copy over to my NFS share.  Using a tool called ddrescue I was able to save most of my data, I’ll go into how it works and what I did.

Well, on Tuesday morning my PC wouldn’t start after a reboot, it would pretty much freeze right after GRUB did its thing.   I booted from the Ubuntu Live CD and saw some lovely errors in the messages file:

Jan 27 13:29:55 shrek kernel: [74211.342500] sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jan 27 13:29:55 shrek kernel: [74211.342504] sd 1:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Jan 27 13:29:55 shrek kernel: [74211.342509] Descriptor sense data with sense descriptors (in hex):
Jan 27 13:29:55 shrek kernel: [74211.342513]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jan 27 13:29:55 shrek kernel: [74211.342523]         02 80 07 64
Jan 27 13:29:55 shrek kernel: [74211.342528] sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error – auto reallocate failed
Jan 27 13:29:55 shrek kernel: [74211.342562] ata2: EH complete
Jan 27 13:29:55 shrek kernel: [74211.342599] sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
Jan 27 13:29:55 shrek kernel: [74211.342615] sd 1:0:0:0: [sdb] Write Protect is off
Jan 27 13:29:55 shrek kernel: [74211.342642] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
Jan 27 13:29:58 shrek kernel: [74214.288344] ata2.00: configured for UDMA/33
Jan 27 13:29:58 shrek kernel: [74214.288352] ata2: EH complete
Jan 27 13:30:01 shrek kernel: [74217.226167] ata2.00: configured for UDMA/33
Jan 27 13:30:01 shrek kernel: [74217.226177] ata2: EH complete
Jan 27 13:30:01 shrek kernel: [74217.226220] sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)

I then asked my companies desktop support group for two new hard drives, 160GB SATA3 Western Digital disks were delivered within a few hours.  I then configured the Intel Storage BIOS on my HP xw4600 to put both disks into a RAID-1 (I should have done this when I first build it – thing originally only came with 1 HDD).

I then installed Ubuntu 8.10 using the alternative disk because I need the dmraid support for my RAID-1 (if you didn’t know, most PCs do not actually have hardware RAIDs, they are software assisted which means you need a O/S driver for it to fully work).

After my O/S was installed and to my liking, I installed VMware Workstation 6.5 – now I am ready to recover my VMDK files!

I then shutdown the machine and connected my bad HDD to an open SATA port.  Booted up Ubuntu, which took forever because it was complaining about the above errors.  I also connected an external USB drive with plenty of open space (I’ll get into this more later). After the system was up and I was able to see the disk via fdisk -l I proceeded to install ddrescue by typing  sudo apt-get install gddrescue

After install I then ran the following command:

ddrescue -r3 /dev/sdb3 /media/USB/Backup/sdb3.image /media/USB/Backup/sdb3.logfile

Lets break this down,  obvious ddrescue is the program, -r3 is the amount of retries it will attempt on a bad block (obviously if you think your HDD is going out fast, you would want to lower this number, or use -n to not retry at all).  You then specify the entire disk you wish to recover, or the partition. Here I’m attempting to recover partition 3 on disk sdb (found from fdisk -l). Next is where you want the image to be placed, ddrescue can write to a new partition on another disk or to a image file (on a USB drive or other type of media). Finally you tell it where to write the log file, one nice feature of ddrescue is the ability to stop and start the rescue process – it keeps a log of exactly what it did last so it can pick up right where it left off.

One thing you must know, if you rescue a partition (or disk) to another partition, disk or image-  you MUST have enough space on the destination to cover the entire partition (or disk).  For example, sdb3 on my bad disk was 60GB total size, although I was only using 30GB of it I needed to ensure my USB drive had 60GB free to write the image.

This is an example of ddrescue running, right here it has ran and it is splitting the bad blocks and writing over the bad ones with clean 0-data.  One thing thats cool about ddrescue is that it will attempt for example 512B blocks, if it hits a bad block it will attempt at 256B and then all the way down to 1B until it gets some successful data.  This can take a LONG time if you have a ton of bad blocks,  the below example has been running for over 20 hours and still isn’t complete.

Initial status (read from logfile)
rescued:    80527 MB,  errsize:  14602 kB,  errors:    1550
Current status
rescued:    80527 MB,  errsize:  14286 kB,  current rate:        0 B/s
ipos:    26172 MB,   errors:    2223,    average rate:       19 B/s
opos:    26172 MB
Splitting error areas…

So, after the ddrescue is complete, you will need to fsck the image (or partition(s)) before you can mount them.  If for some reason you cannot remember the filesystem type you can use a tool like mmls to find out what type of filesystem the partition is.  Lets say it is a ext3 filesystem, we will use e2fsck to fsck it: e2fsck -y /media/USB/Backup/sdb3.image — the -y will blast through the fsck answering yes to all of the block changes it will want to make.

After fsck, I then mounted the image:  mount -o loop /media/USB/Backup/sdb3.image /mnt —  to my surprise, all my lovely VMDK and VMX files were in there!  Oh glory!


Created on January 29, 2009 by Rick Scherer

Posted under Good Reading, Linux.

This blog has 30,269 views.

Tags: , , , , ,

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 5.00 out of 5)
Loading...

8 Comments so far

  1. virtu - Al
    12:36 pm on February 2nd, 2009

    I am glad to hear that you were able to recover. Thanks for sharing this – every hard drive is just bad head park away from not being able to read data. Yet we still love ’em. At least untill SSD become better cost model.

  2. negonicrac
    5:49 am on March 19th, 2009

    Thanks. This saved all of my information off of a bad partition. One question though how do you monitor to see if the hard drive is going bad and other than using raid 1 how do you backup your information? Thanks.

  3. Rick Scherer
    6:51 am on March 19th, 2009

    There is really no ‘failsafe’ way to monitor a HDD for upcoming failure. If your system supports SMART this may be your best bet as it will notify you if it thinks your HDD is going to fail. (SMART notified me right when the drive failed, so it really wasn’t proactive).

    HDD’s nearing the end of their life will typically become louder, you may hear a high pitch whine coming from it and also performance will degrade dramatically. You may also see some random data loss occur.

    Glad you were able to recover!

  4. negonicrac
    6:07 pm on March 23rd, 2009

    Thanks I will look into setting up smart.

  5. hansnospam
    8:38 pm on May 8th, 2009

    For the Debian users, the package is gddrescue. (/sbin/ddrescue)

    There is different package called ddrescue. (/bin/dd_rescue) ddrescue is another valuable rescue program. Both are great!!!

  6. jaomadn
    1:08 pm on April 13th, 2011

    Thanks for this very nice info.

    By the way does ddrescue support/recover ntfs FS

  7. Konrad
    6:30 am on December 21st, 2016

    How Did you mount image from VMFS with mount instead of vmfs-fuse ?

  8. Rick Scherer
    5:38 pm on January 3rd, 2017

    @Konrad, I actually didn’t mount the VMFS – I mounted a copy of the partition which happened to be ext3.

    This example is for low-level partition recovery from a failing drive, not to recover a VMDK.

Leave a Comment

Name (required)

Email (required)

Website

Comments

More Blog Post