How fast is your disk?

With a little bit of torturing, and some fun on the way, find out how fast your hard disk drive really is.

Introduction

1-Terabyte hard disk drives are slowly coming to the market, so I suppose we can't complain that we don't have enough space to save (the ever increasing amount of) our precious data. But, it's also a known fact that although disk storage capacities are improving at an impressive rate, disk performance improvements are occurring at a rather slower rate. Unfortunately, larger disk doesn't always mean faster disk. What follows is an explanation of two techniques for measuring disk performance in Linux.

Methodology

As an example, I've tested three different disks, one standard ATA (IDE) drive, and two SCSI disks with different rotational speed:

Disk 1: ATA 120GB, Seagate Barracuda 7200.7 Plus (ST3120026A), 8MB cache, 7200 rpm
Disk 2: SCSI 36GB, Seagate Cheetah 10K.6 (ST336607LC), 8MB cache, 10000 rpm
Disk 3: SCSI 18GB, IBM Ultrastar 36Z15 (IC35L018UCPR15-0), 4MB cache, 15000 rpm

IMPORTANT! When running the below explained benchmarks, your disk should be as idle as possible. Otherwise, you'll get wrong (worse) numbers. Don't run any other disk intensive program at the same time when you are running benchmarks. And, BTW, don't worry about your data, both benchmarks are only reading from disk, meaning, they're not destructive.

Sequential access

Sequential access is when you're reading or writing disk blocks in sequential order, that is, one block after another. Rarely will you do exactly that (unless you're copying raw partitions, of course), but every time you're moving big files (for example ISO images) around, your disk access pattern comes close to sequential access. This is also where your disk truly shines, because disk head movement is minimal, so you can get high disk transfer speeds.

Measuring sequential disk performance is easy, every modern Linux distribution comes with a little tool called hdparm, which is primarily used to tune and optimize disk parameters, but also has a switch to use it as a simple benchmark tool. Run it like this:

% sudo hdparm -t /dev/hda

/dev/hda:
Timing buffered disk reads: 140 MB in 3.02 seconds = 46.28 MB/sec

Substitute /dev/hda with the name of your raw disk device, of course (for example, it might be /dev/sda if you're using libata, or something else).

It's impossible to get higher transfer rate from the disk than that (but on that particular disk, computer, and if disk is setup optimally!). hdparm -t is reading the very start of the disk (which is the fastest area of every disk) and with an optimal access pattern. You can't beat that!

Let's see what our test disks are made of:

While there is some fluctuation, I wouldn't say that there are big differences among the disks. But, you probably noticed that the supposedly fastest 15000 rpm disk is slower than the other two. That's because it's a quite old disk (from 2002). So there was some improvement in disk transfer speeds through time, after all, but nothing groundbreaking.

Random access

Random access is where you access your disk at random. In that case the disk head moves rapidly from one place to another. Because that involves mechanical operation, this type of access is much slower than sequential access. Unfortunately, many real world workloads involve access patterns that are much closer to random access than sequential access. That's why I think it's much more interesting to measure and see how disk behaves under random access pattern, than sequential.

There's no standard tool to measure the random access time, so I have written a simple utility that you can find attached at the bottom of this article. I've named it seeker and it has a simple job: to read small pieces of data from a raw disk device, in a random access pattern. It is important to run it on the whole disk (not on a single partition!) if you want to compare results of your disk with others!

Compile the utility like this:

% gcc -O2 seeker.c -o seeker

or if you're lazy or don't have compiler at hand, you can also download the binary below. Then run it like this:

% sudo seeker /dev/hda

Just like hdparm, it needs superuser privileges to access the raw disk device. The output looks like this:

Seeker v2.0, 2007-01-15, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [17501MB], wait 30 seconds..............................
Results: 167 seeks/second, 5.95 ms random access time

Allow seeker to run for 30 seconds (you might hear some ugly noises if your disk doesn't have so called acoustic management capability or if it's not turned on), and you'll get the average number of seeks per second, and calculated random access time of the disk. Now, the utility reads only 512 bytes of data per one read operation, but internal kernel readahead mechanisms translate that to 4096 byte I/O operations. Multiplying 167 with 4096, you can estimate that disk is reading only 668 KB/sec (or you can use iostat utility to monitor that in realtime). And that is true, this is the absolute worst case scenario, and your disk should always perform better than that, in every real scenario. But, compare that with the numbers we got from the sequential case, and you'll see how rapidly disk performance degraded when the disk arm started moving!

Finally we see some difference among disks, it's now obvious that disks that are rotating faster are better performers. 15000 rpm SCSI is now on top, it doesn't matter it's so old, it leaves the ATA drive far behind.

The above numbers are actually the same data presented in another way. Random access time is a metric that represents the typical time it takes disk to go read a random block. If you have disk manufacturers data with you, that time should be close to the sum of the manufacturers average seek time and average latency time. They declare average seek time as a typical time to move head arm from one position to another, and average latency time is time needed for wanted data block to come below the head (disk is constantly rotating, right?). Is the calculation right for your disk?

Conclusion

If you have some older and some newer disk, and run the above tests on them, you'll soon discover that hard disk performance hasn't improved much over time. The bigger cache on modern disks helps a little bit, but only on some specific workloads. But, don't listen to everything I say, run your own tests and report results as a comment below. If we get enough results, I could even summarize them in a useful graph. And, I would really really love to see the numbers for those Western Digital Raptors, is their performance really comparable to the expensive SCSI drives?