With a little bit of torturing,
and some fun on the way, find out how fast your hard disk drive
really is.
Introduction
1-Terabyte hard disk drives are slowly coming to the market, so I suppose we can't complain that we don't have enough space to save (the ever increasing amount of) our precious data. But, it's also a known fact that although disk storage capacities are improving at an impressive rate, disk performance improvements are occurring at a rather slower rate. Unfortunately, larger disk doesn't always mean faster disk. What follows is an explanation of two techniques for measuring disk performance in Linux.
Methodology
As an example, I've tested three different disks, one standard
ATA (IDE) drive, and two SCSI disks with different rotational
speed:
Disk 1: ATA 120GB, Seagate Barracuda 7200.7 Plus (ST3120026A), 8MB
cache, 7200 rpm
Disk 2: SCSI 36GB, Seagate Cheetah 10K.6 (ST336607LC), 8MB cache,
10000 rpm
Disk 3: SCSI 18GB, IBM Ultrastar 36Z15 (IC35L018UCPR15-0), 4MB
cache, 15000 rpm
IMPORTANT! When running the below explained
benchmarks, your disk should be as idle as possible. Otherwise,
you'll get wrong (worse) numbers. Don't run any other disk
intensive program at the same time when you are running benchmarks.
And, BTW, don't worry about your data, both benchmarks are only
reading from disk, meaning, they're not
destructive.
Sequential access
Sequential access is when you're reading or writing disk blocks in sequential order, that is, one block after another. Rarely will you do exactly that (unless you're copying raw partitions, of course), but every time you're moving big files (for example ISO images) around, your disk access pattern comes close to sequential access. This is also where your disk truly shines, because disk head movement is minimal, so you can get high disk transfer speeds.
Measuring sequential disk performance is easy, every modern
Linux distribution comes with a little tool called
hdparm, which is primarily used to tune and optimize
disk parameters, but also has a switch to use it as a simple
benchmark tool. Run it like this:
% sudo hdparm -t /dev/hda
/dev/hda:
Timing buffered disk reads: 140 MB in 3.02 seconds = 46.28
MB/sec
Substitute /dev/hda
with the name of your raw disk
device, of course (for example, it might be /dev/sda
if you're using libata, or something else).
It's impossible to get higher transfer rate from the disk than
that (but on that particular disk, computer, and if disk is setup
optimally!). hdparm -t
is reading the very start of
the disk (which is the fastest area of every disk) and with an
optimal access pattern. You can't beat that!
Let's see what our test disks are made of:
While there is some fluctuation, I wouldn't say that there are big differences among the disks. But, you probably noticed that the supposedly fastest 15000 rpm disk is slower than the other two. That's because it's a quite old disk (from 2002). So there was some improvement in disk transfer speeds through time, after all, but nothing groundbreaking.
Random access
Random access is where you access your disk at random. In that case the disk head moves rapidly from one place to another. Because that involves mechanical operation, this type of access is much slower than sequential access. Unfortunately, many real world workloads involve access patterns that are much closer to random access than sequential access. That's why I think it's much more interesting to measure and see how disk behaves under random access pattern, than sequential.
There's no standard tool to measure the random access time, so I have written a simple utility that you can find attached at the bottom of this article. I've named it seeker and it has a simple job: to read small pieces of data from a raw disk device, in a random access pattern. It is important to run it on the whole disk (not on a single partition!) if you want to compare results of your disk with others!
Compile the utility like this:
% gcc -O2 seeker.c -o seeker
or if you're lazy or don't have compiler at hand, you can also
download the binary below. Then run it like this:
% sudo seeker /dev/hda
Just like hdparm, it needs superuser privileges to
access the raw disk device. The output looks like this:
Seeker v2.0, 2007-01-15,
http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [17501MB], wait 30
seconds..............................
Results: 167 seeks/second, 5.95 ms random access
time
Allow seeker to run for 30 seconds (you might hear
some ugly noises if your disk doesn't have so called acoustic
management capability or if it's not turned on), and you'll
get the average number of seeks per second, and calculated random
access time of the disk. Now, the utility reads only 512 bytes of
data per one read operation, but internal kernel readahead
mechanisms translate that to 4096 byte I/O operations. Multiplying
167 with 4096, you can estimate that disk is reading only 668
KB/sec (or you can use iostat
utility to monitor that in realtime). And that is true, this is the
absolute worst case scenario, and your disk should always perform
better than that, in every real scenario. But, compare that with
the numbers we got from the sequential case, and you'll see how
rapidly disk performance degraded when the disk arm started
moving!
Finally we see some difference among disks, it's now obvious that disks that are rotating faster are better performers. 15000 rpm SCSI is now on top, it doesn't matter it's so old, it leaves the ATA drive far behind.
The above numbers are actually the same data presented in another way. Random access time is a metric that represents the typical time it takes disk to go read a random block. If you have disk manufacturers data with you, that time should be close to the sum of the manufacturers average seek time and average latency time. They declare average seek time as a typical time to move head arm from one position to another, and average latency time is time needed for wanted data block to come below the head (disk is constantly rotating, right?). Is the calculation right for your disk?
Conclusion
If you have some older and some newer disk, and run the above tests on them, you'll soon discover that hard disk performance hasn't improved much over time. The bigger cache on modern disks helps a little bit, but only on some specific workloads. But, don't listen to everything I say, run your own tests and report results as a comment below. If we get enough results, I could even summarize them in a useful graph. And, I would really really love to see the numbers for those Western Digital Raptors, is their performance really comparable to the expensive SCSI drives?