RAID (Redundant Array of Independent Disks)

1. Meaning of RAID

RAID stands for Redundant Array of Independent Disks.

It is a technology used to improve:

  1. Performance

  2. Reliability (fault tolerance)

  3. Storage capacity

It works by using multiple hard disks together as one logical unit.

RAID uses techniques such as:

  • Mirroring (making exact copies)

  • Striping (splitting data across disks)

  • Parity / Error Correction (for rebuilding data after a failure)

Combining multiple disks improves both speed and reliability depending on the RAID level used.


2. MTBF (Mean Time Between Failures)

MTBF is a measure of reliability of a system.

It tells us:
On average, how much time a system works before it fails.

Based on the image you provided, the basic formula is:

Time Between Failures = (down time − up time)

In general:

MTBF = Σ(time between failures) / number of failures

Example:
If failures occur at intervals of 200 hours, 230 hours, and 190 hours:
MTBF = (200 + 230 + 190) / 3 = 206.67 hours

MTBF is important in RAID because the overall array reliability depends on the reliability of all disks.


3. RAID Flavors (Common RAID Levels)

Commonly used RAID types:

  • RAID 0

  • RAID 1

  • RAID 5

  • RAID 10

Less common types:
RAID 2, 3, 4, 6, 50, etc.

Each RAID type offers a different balance of performance and reliability.


4. RAID 0 (Striping)

Key points:
a. Data is split across two or more disks.
b. Provides very good performance (fast read and write).
c. No redundancy. If any one disk fails, all data is lost.
d. Example: odd-numbered blocks go to disk 0, even-numbered blocks to disk 1.
e. Used in systems where speed is important and data loss is acceptable, such as gaming PCs or read-only file servers.

RAID 0 Failure Rate Analysis

If the probability of a disk failing = p,

Probability at least one fails in RAID 0:
P(at least one fails) = 1 – (1 – p)^n

Example:
If p = 0.05 and there are 2 disks:
1 – (1−0.05)^2
= 1 – (0.95)^2
= 9.75%

RAID 0 becomes less reliable as you add more disks.

RAID 0 Performance

  • Striping allows blocks to be read or written in parallel.

  • Smaller pieces are stored across multiple disks, so operations are faster.

  • Best RAID for pure performance.

  • Worst for reliability.


5. RAID 1 (Mirroring)

Key points:
a. Data is duplicated on two or more disks.
b. Both disks contain identical data.
c. Requires double the storage. Storing 1 TB of data requires 2 TB of disk space.
d. System continues working as long as one disk is alive.
e. Very high reliability.

RAID 1 Failure Rate Analysis

If p = probability of one disk failing,
P(both disks fail) = p^2

Example:
If p = 0.05:
P(both fail) = 0.05 × 0.05 = 0.25%

This is much lower than RAID 0.

RAID 1 Performance

  • Good read performance: reads can happen from both disks in parallel.

  • Write performance is similar to a single disk (must write to both disks).

  • If each disk has its own controller, performance improves further.


6. RAID 5 (Striping with Distributed Parity)

Key points:
a. Good combination of performance, fault tolerance, and storage efficiency.
b. Uses parity (error correction code) distributed across all disks.
c. Requires at least 3 disks.
d. Can survive one disk failure.
e. When a disk fails, parity is used to rebuild the missing data.

Why Distributed Parity?

So that the parity load does not fall on a single disk.
This balances performance and reduces bottlenecks.

RAID 5 Analysis

Reliability:

  • Better than RAID 0 because it survives one disk failure.

  • Worse than RAID 1 for reliability.

  • MTBF improves since the array does not fail immediately on one disk failure.

Performance:

  • Good read performance (parallel reads).

  • Write performance is slower because parity must be calculated.

  • Suitable for systems with more reads than writes.

Use cases:
File servers, application servers, general-purpose systems.


7. RAID 10 (RAID 1 + RAID 0)

Key points:
a. Combines RAID 1 (mirroring) and RAID 0 (striping).
b. Also called Nested RAID or RAID 1+0.
c. Offers both high speed and high reliability.
d. Requires at least 4 disks (two mirrors, then striped).

How it works:

  • First, two disks are mirrored.

  • Then these mirrors are striped.

This gives the read/write speed of RAID 0 and the fault tolerance of RAID 1.

Use case:
Databases, transaction-heavy systems, high-performance servers.


8. RAID Implementations

Software RAID

  • Implemented by the operating system (Linux, Windows etc.).

  • Uses CPU power for RAID operations.

  • Cheaper and easier to set up.

  • Good for RAID 0 and RAID 1.

  • Not ideal for heavy workloads.

How it works:
A software layer sits above the disk drivers and manages RAID logic.

Hardware RAID

  • Uses a dedicated RAID controller card.

  • Controller does all parity and RAID calculations.

  • CPU is not burdened.

  • Faster and more reliable, especially for RAID 5, RAID 6, and RAID 10.

  • More expensive.

Hardware RAID controllers may be built on:

  • Desktop motherboards

  • Server motherboards

  • PCI RAID cards


9. What is happening today? (Modern RAID Trends)

RAID 6

a. Uses double parity (two parity blocks).
b. Can survive failure of any two disks.
c. Provides better data protection than RAID 5.
d. Slower writes because of double parity calculation.
e. MTBF is better than RAID 5.

Future and Advancements

  1. Wider adoption of RAID 6 and dual-parity RAID levels.

  2. Fast rebuild technologies that can rebuild huge disks quickly.

  3. Striping across RAID groups, not just disks within a group.

  4. Better disk diagnostics to predict failures early.

  5. Hot spares: extra disks in the system that automatically replace failed ones.

Updated on