Thursday, October 24, 2013

How to backup/wipe your hard disks using dd

I recently sold my old Dell Inspiron 1420 and bought a shiny new Lenovo Thinkpad T430. Now I intend to write a separate post about my experience with this new machine, but in this post I would like to cover some basics of the command line tool dd. This tool was extremely handy and helped me to securely wipe all data on my old hard drive before I parted with it. You can never rely on plain formatting to ensure that all sensitive information (personal files, saved passwords, etc.) from your old hard disk has been erased. However, there are a lot of tools out there that are an overkill for this task. This is where dd steps in - it is simple and present in most Linux distributions including the Ubuntu Live USB. However, it is also a very dangerous tool so make sure that you understand what you are doing before you blindly copy paste the commands into your terminal. You may end up with your data being permanently lost. So let's begin.

First of all the, basic structure of dd. A typical dd command will look like this:
user@pc: ~$ sudo dd if= of= bs= conv=
Here, if stands for the input file (could also be a partition or an entire drive), of for the output file (could also be a disk partition or an entire drive), bs stands for byte size (default value is 512) and the conv stands for conversion options. Specifying the byte size and conversion options are purely optional but highly recommended because of the following reasons:

a. The default value of byte size will be extremely slow. However, a little tweaking of this default value can increase the speed of the backup/wipe operation by up to 25 times! For example,
bs=4096
sets the block size to 4k, an optimal size for hard disk read/write efficiency and therefore, cloning speed.

b. Because dd is not very user friendly, it will not print any progress report while it running. But that is not the worst part, it will simply stop if it encounters a bad sector in your hard drive. The conversion options allow us to modify this behavior. For example,
conv=noerror,notrunc,sync
The noerror option all dd to just write zeros when it comes across bad sectors and then continue with the rest of the data. The sync option ensures that the sectors on the target and source device are aligned. The notrunc option or 'do not truncate' maintains data integrity by instructing dd not to truncate any data.

If you wish to know more, I highly recommend reading the wiki here (German only) or here.

Examples:
Important: Always find out the correct hard drive name/partition using sudo fdisk -l or sudo blkid. Incorrect use of the dd command can result in permanent loss of data!

a) Backup
If I wish to backup my linux partition which is /dev/sdb1 to a file on my external hard drive, the dd command will look like this:
user@pc: ~$ sudo dd if=/dev/sdb1 of=/media/EXTERNAL/hdd_backup.img bs=1M conv=noerror,notrunc,sync
b) Erase
If I intend to erase all data on my hard drive so that no one can recover it, I simply have to write the entire hard disk with zeros. Or better, I can write random values all over. Even better, I can run multiple passes of the same command and I will know that the data is beyond recovery even by . However, the process to generate and write random values is time consuming and unless you are a spy who is trying to destroy vital information, writing zeros (with 2 passes instead of 1) is good enough for the rest of us. Also, destroying the hard disk physically would be a better options in the former case. Let us say, I have to wipe the data on my /dev/sdb. I have two options, namely
Writing zeros:
user@pc: ~$ sudo dd if=/dev/zero of=/dev/sdb bs=1M conv=noerror,notrunc,sync
Writing random values:
user@pc: ~$ sudo dd if=/dev/zero of=/dev/sdb bs=1M conv=noerror,notrunc,sync
It goes without saying that you will need to boot from a Live USB in order to be able to wipe your primary hard drive. At this point, I should also mention that the byte size value is (obviously) mentioned in bytes but the following suffixes can be used for convenience.

SuffixMultiplier
KB1000 (ie 1KB equals 1000 bytes)
K1024 (ie, 1K equals 1024 bytes)
MB1000000 (= 1000 * 1000, that corresponds to 1000000 bytes 1MB)
M1048576 (= 1024 * 1024, 1M equals 1048576 bytes)
GB1000000000 (= 1000 * 1000 * 1000, ie 1GB equals 1 billion bytes)
G1073741824 (= 1024 * 1024 * 1024, d, h, 1G byte corresponds 1073741824)

Now, a little searching online led me to this website which tries to identify an ideal byte size (bs=131072) in order to optimize the speed of operation. But because I had not though this through I also have some of my own statistics to share. Because I had two disks, I could also compare the speeds using SATA and IDE interfaces. Here are the results:

SATA 5400 RPM Data written (GB) Time taken (mins) Average Speed (MB/s)
sudo dd if=/dev/urandom of=/dev/sdb 83 399.5 3.5
sudo dd if=/dev/zero of=/dev/sdb 159 115.0 23.0
sudo dd if=/dev/zero of=/dev/sdb bs=131072 750 165.5 75.5
IDE (PATA) 5400 RPM Data written (GB) Time taken (mins) Average Speed (MB/s)
sudo dd if=/dev/urandom of=/dev/sda 56 333.9 2.8
sudo dd if=/dev/zero of=/dev/sda 71 115.2 10.3
sudo dd if=/dev/zero of=/dev/sda bs=131072 160 72.5 36.8

As is pretty evident, the writing of randomly generated data is extremely slow and it will be faster to perform two passes of zero writes instead of one random write. Not to mention, it is next to impossible to recover data from a drive written with zeros, so using random values is pretty much overkill for the average Joe. Further, a change of byte size from the default values can speed up the entire process by up to 4 times.

No comments:

Post a Comment