Disk drives are mechanical devices with components
engineered to specs measured in fractions of a human hair. Unlike computer chips which are also built to
extremely tight tolerances these devices also contain many moving parts moving
at extremely high speed. There are few
things in life that are as certain as death, taxes, and disk drive
failure. All disk drives will fail
eventually and it's a matter of when it will happen, how much of an impact it
will have, and what we will do to recover.
Solutions such as backup are excellent for recovering data after a drive
failure. However, restoring from a
backup means that any data changed since the last backup will be lost and they
system will be unavailable until the data is restored, which can be time
consuming.
RAID (to be discussed in the next entry) allows us to
protect against single drive failure as well as provide better performance and
capacity then we can get from a single disk drive. RAID is a key component of
all data storage specific devices. Almost all server class hardware has the
ability to implement RAID. Even some desktops and laptops can use this. RAID spreads data across multiple disk drives
to distribute the workload and aggregate the capacity of the individual drives.
Additionally RAID offers data protection using parity (checksum) or a mirror
(copy) of the data. Using RAID allows a
single disk drive to fail without any loss of data or downtime. In the event of a failure the faulted drive
is removed and replaced. The data is then rebuilt from the surviving disk
drives making the device self-healing.
In many cases the drives are hot swappable and can be changed in minutes
while the system stays online and users can access their data. This is a significant advantage over single
disk drives even for the storage admin.
Performance
Disk drives are mechanical devices that store data on a
magnetic medium. If you were to envision
how a disk drive works it looks much like a record player. There is a round spinning platter with a
magnetic coating that contains the data. A mechanically actuated arm holds a
head hovering over the spinning disk without actually touching it. As the platter spins around the head mounted
on the arm can read or write data to the platter underneath it. There are actually several of these platter stacked on top of each other. In order to access different sections of the
drive the arm moves the head back and forth.
In order to access the data the drive must move the arm and wait for the
section of the drive with the information to rotate around and pass under the
head. The time required to complete this
process is called seek time.

The largest determination of how quickly a drive can
access the data is a function of the drives rotational speed. The specs of drives vary by model and
manufacturer. Your mileage will vary from these specs depending on variables
such as the size of the I/O request, whether they are sequential or random, and
a number of other factors. A simple rule
of thumb for determining disk I/O throughput is about 180 IOPS (Input Output
Per Second) for a 15,000 RPM drive, 120 IOPS for a 10,000 RPM Drive, 80 IOPS
for a 7,500 RPM disk drive, and 40 IOPS for a 5,400 RPM disk drive . In order to achieve better performance then
what a single disk drive can handle we need to spread the work load over
several individual disks using RAID.
Many server admins don't often work in IOPS numbers. Below are the drive bandwidth numbers 64KB
block transactions. The larger the block
size the higher the bandwidth, but most open systems including windows use a
64KB block size. The limitation of using
an individual drive becomes rapidly apparent in a server environment where you
have multiple users competing for resources. Any user that has watched their
hard drive light blink away while they wait knows the effect of having a disk
drive that can't keep up. Having many
users sharing the same physical drive in a server significantly opens up the
possibility for disk saturation. Further
the additional workload on a non enterprise class drive can lead to premature
drive failure and data loss.
|
15k RPM FC or SAS
|
8MB/s
|
|
10K RPM FC or SAS
|
6.0 MB/s
|
|
7.5K SATA or
NL-SAS
|
4.0 MB/S
|
|
5.4K SATA or NL-SAS
|
2.5 MB/s
|
|
Enterprise Flash Drive
|
100 MB/s
|
Drive Sizes 2.5" vs. 3.5"
Recently 2.5" disk drives have been introduced for
storage systems and servers. Noticeably
smaller than the standard 3.5" drives they are replacing they are bout the same
physical dimensions as a laptop hard drive.
However, these drives are available in 10k RPM and in some cases 15K RPM
and meant to withstand 24x7x365 use. Due
to the increase in Arial density (the ability to put the same amount of data on
less space) these drives have some interesting performance characteristics. When we look at the Seek and access times of
the 2.5' drives they are several milliseconds faster than their larger 3.5"
counterparts. The 2.5" 10K rpm drives
are within ~1ms of their 3.5" 15K counterparts.
Their IOPS throughput numbers are fairly close, however their bandwidth
(MB/s) are lower than their 3.5" counterparts.
The reason for this is that the 2.5" drive head has less
distance to travel to access different areas of the drive because the patter is
smaller. Additionally since the
circumference of the platter is smaller it takes less time for the platter to
rotate the section of the drive with the data under the drive head. The tradeoff is that the smaller diameter
platter is able to store less sequential data per track and reading additional
sequential data requires repositioning the drive head. This results in lower overall bandwidth.
Reliability
Disk Drive manufactures measure reliability as the Mean
Time between Failure (MBTF) measured in hours.
The MBTF for a drive is based on a testing sample of drives and
calculating the frequency of how often failures occur. This is measurement is useful for
understanding the probability of a failure and not for estimating when any one
particular drive will fail. Many drives
will fail before the drives spec MBTF is reached and many other will exceed the
anticipated life span.
MBTF = (Test Time * Number of Drives Tested) / Number of
Failures
Some Drive manufacturers have developed alternative ways
of measuring anticipated drive reliability.
One method is Seagate's Annualized Return Rate (ARR). This is a measure of drive reliability based
on the number of drives returned as defective during a time period. Another variation on this method is the
Annualized Failure Rate (AFR).
ARR = Number of Units Returned for a Year / Total Units
Shipped for the Year
Not all disk drives are created equal. MBTF varies by the type of drive and its
intended use case. Consumer drives for
laptops and desktops are rated at around 500,000 hours MBTF. External USB drives would also fall into the
types of drives found in this category.
Enterprise class drives found in server and storage hardware are rated
at 1.2 - 1.6 million hours MBTF. With the 15K and 10K RPM SAS and Fibre Channel
drives being closer to 1.6M and the near
line SAS or SATA drives being closer to 1.2M hours. Consumer desktop grade hard drives are also
only intended for 9x5 usage where enterprise drives are rated for 24x7x365
usage.
The number of start and stop cycles of a drive also
contribute to the life expectancy of a disk drive. When a disk drive spins up
the rotation of the platter forces a cushion of air between the disk surface
and the read/write head. When the drive
spins down the actuator arm moves the head to a location that doesn't contain
any data to prevent the head from causing damage to the disk. Still each time the drive spins up there is
additional wear on the drive head as well as to the motors that spin the drive
and other components. New features such
as disk spin down have increased the significance of this wear indicator and
many drive manufacturers provide data and how many start/stop cycles a drive
can endure. Again this spec is an
indicator of overall reliability and not when a particular drive will fail.
Google Study
To better understand disk drive life expectancy Google
did a study on 100,000 ATA disk drives in individual servers in their data
center between 2005 and 2006. Using the
data collected Google was able to determine which factors were most influential
in premature disk failure. In many of
the studies done there was an initial high mortality rate in the 3-6 months'
time period that was usually attributed to the infant mortality of defective
drives. For example after the initial 3
months disk drive failure rates drop off substantially and are highest in years
2 and 3. Year 3 offers the highest drive
failure rate at 8%.
Heavy utilized disk drives fail by about 10% in the first
3 months, where low use disk drives only fail about 4% of the time, and medium
use drives fail about 2% of the time.
In drives 2 years and older this utilization made very little difference
in the failure rate of the drives, accounting for a <0.05% difference in
drive failures. However, it is
interesting to see that low utilization drives fail most often followed by high
utilization drives.
During the testing period temperature data was collected
from SMART. The temperature data did not
show any significant increase in drive failure as temperature increased until
extreme temperatures were reached. Most
of the drive failures were at lower temperatures. Perhaps this was due to intermittent use or
extra start / stop cycles of those drives.
SMART error data was also collected during the test and
analyzed to see if it could accurately detect failures. The Google study found that SMART could
indicate disk failure but that SMART was not reliable in detecting all disk
failures. Of the drives that failed 36%
of the drives had 0 errors and 56% of the drives had 0 errors that would have
indicated a strong signal the drive might fail.
Google considers strong signals; scan errors, reallocation count,
offline reallocation, and probational count to be good indicators for
determining a drive may fail. Other
error codes such as seek errors and CRC errors did not have a statistical
correlation to drive failures.
Drives showing SMART Scan Errors are 10x more likely to
experience drive failure. Scan errors
have a higher effect on the mortality of younger drives then on older
ones. Multiple errors increase the
probability that a drive will fail and indicates that the drive will fail more
quickly.
Reallocation Errors indicate that there was a problem
writing to or reading from a section of a drive and that data was moved to a
different section of the drive. Offline
reallocations occur when an error is found by background tasks checking
un-accessed sections of the drive. SMART
Reallocation Errors indicate a drive that is 3-6x more likely to fail. Drives are 14x more likely to fail within 60
days of their first reallocation error then drives without any errors. The probability of failure within 60 days
increases to 21x more likely if the reallocation is done offline.
Further Reading:
http://www.google.com/url?sa=t&source=web&cd=4&sqi=2&ved=0CD4QFjAD&url=http%3A%2F%2Flabs.google.com%2Fpapers%2Fdisk_failures.pdf&rct=j&q=hard%20disk%20reliability&ei=0CjyTcqjD4LZgAeUgKHPCw&usg=AFQjCNGQnQZT4n9wJDyYMmOnmVqfnx1udw&cad=rja
http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_OverallReliability.html
http://www.dell.com/downloads/global/products/pvaul/en/enterprise-hdd-sdd-specification.pdf