RAID systems are arrays with more than one media like hard drives or SSDs and nowadays the most common way of saving data – a seemingly secure one… They are used for server and NAS systems as well as workstations, by private persons, SMEs and large corporations. In comparison to a PC or laptop, a RAID is more prone to data loss, because of the higher number of data carriers, which increases the possibily of hard drive or SSD failure. Without access to the important data, the whole business stands still in the worst case – databases, e-Mail and files storages are offline at one blow.
RAID is no replacement for data backups!
RAID systems produce redundant information via complex algorithms. This way the initial state can be recreated through a rebuild after one or more hard drives fail (dependent on the RAID level) and are replaced. But redundancy should not be confused with a backup.
IT administrators and computing specialists are often surprised by the sudden data loss – an expensive enterprise storage was bought and there have been no sign of a threatening failure beforehand. The answer to this problem often is multiple failure of hard drives or SSDs. Usually all data carriers of a storage have the same life course: production, transport and operating environment are identical for all data carriers in the RAID and dangers are lurking at all of those stages:
Damage may already be caused during the manufacturing process. It often happens that whole hard drive batches have serial defects. That can be firmware (internal data carrier software), mechanical or electronical defects. Those serial defects may occur within a very short period of time, because it’s common to use hard drives of the same batch for a RAID system. Especially at night or at weekends the first failure goes unnotices or is intentionally ignored. As soon as the second hard drive breaks down, a RAID5 is not available anymore.
Another cause for a simultaneous failure of several hard drives is the transport from the manufacturer via resellers to the operation site of the server or RAID. Overheated containers, concussions or other environmental influences may cause damage, which will lead to failure during operation. Here also applies: same batch means same problems.
The ongoing operation also plays an important role: concussions, overheating and overvoltage may reduce the lifespan of a hard drive inside a RAID array significantly, which again applies for all data carriers.
Thunderstorms, floods, fire or concussions (earthquakes or construction work) regularly destroy several data carriers of a RAID at one blow.
Thus, failure of a RAID system is way more likely than you might think!
RAID failure prevention
It seems reasonable to use data carriers of different brands in RAID systems, but this may cause performance and compatibilty issues. The only useful solution is an external backup, because a RAID system alone is not reliable enough! You should consider the following points:
- Regular backups should be saved on external systems and not on the
- Backups should be checked on completeness and functionaliy periodically.
- Constant monitoring is important, in order to immediately be informed via e-mail, SMS or messenger as soon as the first hard drive fails.
- Make a complete backup and check it on integrity before you make a firmware update.
- If you have a backup, the data shouldn’t be saved on the original data carrier, but on a new one. If the data restoration doesn’t work or the data is incomplete, professional data rescuers are able to recover the missing data from the original data carrier in most cases.
Der Autor: Dipl. Ing. Nicolas Ehrschwendner ist seit 20 Jahren geschäftsführender Gesellschafter der Attingo Datenrettung und seit über 30 Jahren in der IT-Branche tätig. Die Attingo Datenrettung betreibt hauseigene Reinraumlabore in Hamburg, Wien und Amsterdam.