Monday, February 19, 2007

Interesting Study About Hard Disk Failure Trends

Several Google engineers performed analysis of hard disk failure trends of their storage infrastructure running inexpensive commercially available hard drives with interesting results. The study itself [2] is a nice reading for statisticians. For more popular (light) version see a BBC article [1]. If I were to highlight the result in one short sentence then I would pick the following one:

We found very little correlation between failure rates and either elevated temperature or activity levels.

Putting aside the results what impressed me is that they had to collect high volume of quality and well structured data from a distributed real time system. Their system (System Health Infrastructure) is briefly described in [2]. It is mostly built on top of Google own technologies (MapReduce, Bigtable, GFS) which allow for a large data set to be effectively stored. Final statistical analysis is done by R system.

[1] Hard disk test 'surprises' Google (BBC news)

