Disk drives are mechanical devices with components engineered to specs measured in fractions of a human hair. Unlike computer chips which are also built to extremely tight tolerances these devices also contain many moving parts moving at extremely high speed. There are few things in life that are as certain as death, taxes, and disk drive failure. All disk drives will fail eventually and it's a matter of when it will happen, how much of an impact it will have, and what we will do to recover. Solutions such as backup are excellent for recovering data after a drive failure. However, restoring from a backup means that any data changed since the last backup will be lost and they system will be unavailable until the data is restored, which can be time consuming.
RAID (to be discussed in the next entry) allows us to protect against single drive failure as well as provide better performance and capacity then we can get from a single disk drive. RAID is a key component of all data storage specific devices. Almost all server class hardware has the ability to implement RAID. Even some desktops and laptops can use this. RAID spreads data across multiple disk drives to distribute the workload and aggregate the capacity of the individual drives. Additionally RAID offers data protection using parity (checksum) or a mirror (copy) of the data. Using RAID allows a single disk drive to fail without any loss of data or downtime. In the event of a failure the faulted drive is removed and replaced. The data is then rebuilt from the surviving disk drives making the device self-healing. In many cases the drives are hot swappable and can be changed in minutes while the system stays online and users can access their data. This is a significant advantage over single disk drives even for the storage admin.
Performance
Disk drives are mechanical devices that store data on a magnetic medium. If you were to envision how a disk drive works it looks much like a record player. There is a round spinning platter with a magnetic coating that contains the data. A mechanically actuated arm holds a head hovering over the spinning disk without actually touching it. As the platter spins around the head mounted on the arm can read or write data to the platter underneath it. There are actually several of these platter stacked on top of each other. In order to access different sections of the drive the arm moves the head back and forth. In order to access the data the drive must move the arm and wait for the section of the drive with the information to rotate around and pass under the head. The time required to complete this process is called seek time.

The largest determination of how quickly a drive can access the data is a function of the drives rotational speed. The specs of drives vary by model and manufacturer. Your mileage will vary from these specs depending on variables such as the size of the I/O request, whether they are sequential or random, and a number of other factors. A simple rule of thumb for determining disk I/O throughput is about 180 IOPS (Input Output Per Second) for a 15,000 RPM drive, 120 IOPS for a 10,000 RPM Drive, 80 IOPS for a 7,500 RPM disk drive, and 40 IOPS for a 5,400 RPM disk drive . In order to achieve better performance then what a single disk drive can handle we need to spread the work load over several individual disks using RAID.
Many server admins don't often work in IOPS numbers. Below are the drive bandwidth numbers 64KB block transactions. The larger the block size the higher the bandwidth, but most open systems including windows use a 64KB block size. The limitation of using an individual drive becomes rapidly apparent in a server environment where you have multiple users competing for resources. Any user that has watched their hard drive light blink away while they wait knows the effect of having a disk drive that can't keep up. Having many users sharing the same physical drive in a server significantly opens up the possibility for disk saturation. Further the additional workload on a non enterprise class drive can lead to premature drive failure and data loss.
|
15k RPM FC or SAS
|
8MB/s
|
|
10K RPM FC or SAS
|
6.0 MB/s
|
|
7.5K SATA or NL-SAS
|
4.0 MB/S
|
|
5.4K SATA or NL-SAS
|
2.5 MB/s
|
|
Enterprise Flash Drive
|
100 MB/s
|
Drive Sizes 2.5" vs. 3.5"
Recently 2.5" disk drives have been introduced for storage systems and servers. Noticeably smaller than the standard 3.5" drives they are replacing they are bout the same physical dimensions as a laptop hard drive. However, these drives are available in 10k RPM and in some cases 15K RPM and meant to withstand 24x7x365 use. Due to the increase in Arial density (the ability to put the same amount of data on less space) these drives have some interesting performance characteristics. When we look at the Seek and access times of the 2.5' drives they are several milliseconds faster than their larger 3.5" counterparts. The 2.5" 10K rpm drives are within ~1ms of their 3.5" 15K counterparts. Their IOPS throughput numbers are fairly close, however their bandwidth (MB/s) are lower than their 3.5" counterparts.
The reason for this is that the 2.5" drive head has less distance to travel to access different areas of the drive because the patter is smaller. Additionally since the circumference of the platter is smaller it takes less time for the platter to rotate the section of the drive with the data under the drive head. The tradeoff is that the smaller diameter platter is able to store less sequential data per track and reading additional sequential data requires repositioning the drive head. This results in lower overall bandwidth.
Reliability
Disk Drive manufactures measure reliability as the Mean Time between Failure (MBTF) measured in hours. The MBTF for a drive is based on a testing sample of drives and calculating the frequency of how often failures occur. This is measurement is useful for understanding the probability of a failure and not for estimating when any one particular drive will fail. Many drives will fail before the drives spec MBTF is reached and many other will exceed the anticipated life span.
MBTF = (Test Time * Number of Drives Tested) / Number of Failures
Some Drive manufacturers have developed alternative ways of measuring anticipated drive reliability. One method is Seagate's Annualized Return Rate (ARR). This is a measure of drive reliability based on the number of drives returned as defective during a time period. Another variation on this method is the Annualized Failure Rate (AFR).
ARR = Number of Units Returned for a Year / Total Units Shipped for the Year
Not all disk drives are created equal. MBTF varies by the type of drive and its intended use case. Consumer drives for laptops and desktops are rated at around 500,000 hours MBTF. External USB drives would also fall into the types of drives found in this category. Enterprise class drives found in server and storage hardware are rated at 1.2 - 1.6 million hours MBTF. With the 15K and 10K RPM SAS and Fibre Channel drives being closer to 1.6M and the near line SAS or SATA drives being closer to 1.2M hours. Consumer desktop grade hard drives are also only intended for 9x5 usage where enterprise drives are rated for 24x7x365 usage.
The number of start and stop cycles of a drive also contribute to the life expectancy of a disk drive. When a disk drive spins up the rotation of the platter forces a cushion of air between the disk surface and the read/write head. When the drive spins down the actuator arm moves the head to a location that doesn't contain any data to prevent the head from causing damage to the disk. Still each time the drive spins up there is additional wear on the drive head as well as to the motors that spin the drive and other components. New features such as disk spin down have increased the significance of this wear indicator and many drive manufacturers provide data and how many start/stop cycles a drive can endure. Again this spec is an indicator of overall reliability and not when a particular drive will fail.
Google Study
To better understand disk drive life expectancy Google did a study on 100,000 ATA disk drives in individual servers in their data center between 2005 and 2006. Using the data collected Google was able to determine which factors were most influential in premature disk failure. In many of the studies done there was an initial high mortality rate in the 3-6 months' time period that was usually attributed to the infant mortality of defective drives. For example after the initial 3 months disk drive failure rates drop off substantially and are highest in years 2 and 3. Year 3 offers the highest drive failure rate at 8%.
Heavy utilized disk drives fail by about 10% in the first 3 months, where low use disk drives only fail about 4% of the time, and medium use drives fail about 2% of the time. In drives 2 years and older this utilization made very little difference in the failure rate of the drives, accounting for a <0.05% difference in drive failures. However, it is interesting to see that low utilization drives fail most often followed by high utilization drives.
During the testing period temperature data was collected from SMART. The temperature data did not show any significant increase in drive failure as temperature increased until extreme temperatures were reached. Most of the drive failures were at lower temperatures. Perhaps this was due to intermittent use or extra start / stop cycles of those drives.
SMART error data was also collected during the test and analyzed to see if it could accurately detect failures. The Google study found that SMART could indicate disk failure but that SMART was not reliable in detecting all disk failures. Of the drives that failed 36% of the drives had 0 errors and 56% of the drives had 0 errors that would have indicated a strong signal the drive might fail. Google considers strong signals; scan errors, reallocation count, offline reallocation, and probational count to be good indicators for determining a drive may fail. Other error codes such as seek errors and CRC errors did not have a statistical correlation to drive failures.
Drives showing SMART Scan Errors are 10x more likely to experience drive failure. Scan errors have a higher effect on the mortality of younger drives then on older ones. Multiple errors increase the probability that a drive will fail and indicates that the drive will fail more quickly.
Reallocation Errors indicate that there was a problem writing to or reading from a section of a drive and that data was moved to a different section of the drive. Offline reallocations occur when an error is found by background tasks checking un-accessed sections of the drive. SMART Reallocation Errors indicate a drive that is 3-6x more likely to fail. Drives are 14x more likely to fail within 60 days of their first reallocation error then drives without any errors. The probability of failure within 60 days increases to 21x more likely if the reallocation is done offline.
Further Reading:
http://www.google.com/url?sa=t&source=web&cd=4&sqi=2&ved=0CD4QFjAD&url=http%3A%2F%2Flabs.google.com%2Fpapers%2Fdisk_failures.pdf&rct=j&q=hard%20disk%20reliability&ei=0CjyTcqjD4LZgAeUgKHPCw&usg=AFQjCNGQnQZT4n9wJDyYMmOnmVqfnx1udw&cad=rja
http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_OverallReliability.html
http://www.dell.com/downloads/global/products/pvaul/en/enterprise-hdd-sdd-specification.pdf