Objective 3.4High12 min

Redundancy and Fault Tolerance

Eliminating single points of failure through server redundancy, network redundancy, storage redundancy (RAID), and power redundancy. Understanding fault tolerance mechanisms and their implementation.

Understanding Redundancy and Fault Tolerance

Redundancy duplicates critical components so that if one fails, others continue operating. Fault tolerance is the system's ability to continue functioning despite component failures. Together, they eliminate single points of failure (SPOF).

Redundancy categories:Server redundancy — Multiple servers, clustering, virtualization • Network redundancy — Multiple paths, NIC teaming, diverse carriers • Storage redundancy — RAID arrays, replication, distributed storage • Power redundancy — UPS, generators, dual feeds

The 2021 Facebook outage occurred when a configuration change disconnected their data centers from the internet for six hours. Despite having redundant systems, the change affected ALL redundant paths simultaneously—demonstrating that redundancy must protect against common-mode failures.

True fault tolerance requires redundancy at every layer and across failure domains.

Why This Matters for the Exam

Redundancy and fault tolerance are heavily tested on SY0-701 because they're fundamental to system resilience. Questions cover RAID levels, network redundancy, and identifying SPOFs.

Understanding redundancy helps with infrastructure design, disaster recovery, and risk assessment. Systems without redundancy are single incidents away from outage.

The exam tests both conceptual understanding and specific technologies like RAID.

Deep Dive

What Is Server Redundancy?

Server redundancy ensures compute capacity survives individual server failures.

Server Redundancy Methods:

MethodDescriptionFailover Time
ClusteringMultiple servers as oneSeconds
VirtualizationVMs migrate between hostsMinutes
Standby serversCold/warm/hot standbyVaries
Cloud auto-scalingAutomatic replacementMinutes

Virtualization HA:

Virtualization High Availability
VM1
VM2
VM3
Host 1
Host 2
Host 3
Shared Storage
If Host 1 fails, VM1 automatically restarts on Host 2 or Host 3

What Is Network Redundancy?

Network redundancy provides multiple paths for traffic to flow.

Network Redundancy Technologies:

TechnologyFunction
NIC teaming/bondingMultiple NICs as one
Redundant switchesMultiple switch paths
Diverse routingMultiple network paths
Multiple ISPsDifferent internet providers
SD-WANMultiple transport options

NIC Teaming Modes:

ModeDescriptionBenefit
Active-BackupOne active, one standbyFailover
Load BalancingTraffic across all NICsPerformance + failover
LACP (802.3ad)Link aggregationBandwidth + failover

Network Redundancy Architecture:

Network Redundancy Architecture
Server
NIC 1
NIC 2
(teamed)
Switch 1
Switch 2
Router 1
Router 2
ISP 1
ISP 2
Redundancy at every layer eliminates single points of failure

What Are RAID Levels for Storage Redundancy?

RAID (Redundant Array of Independent Disks) provides storage redundancy and/or performance.

Common RAID Levels:

LevelDescriptionMin DisksFault Tolerance
RAID 0Striping2None (performance only)
RAID 1Mirroring21 disk failure
RAID 5Striping + parity31 disk failure
RAID 6Striping + double parity42 disk failures
RAID 10Mirroring + striping41 disk per mirror

RAID Diagrams:

RAID 0 (Striping - NO redundancy):

RAID 0 - Striping
No Redundancy
Data: A B C D E F G H
Disk 1
A
C
E
G
Disk 2
B
D
F
H
Fast but ANY disk failure = total data loss

RAID 1 (Mirroring):

RAID 1 - Mirroring
1 Disk Fault Tolerance
Data: A B C D E F G H
Disk 1
A
B
C
D
E
F
G
H
=
Disk 2 (Mirror)
A
B
C
D
E
F
G
H
Either disk can fail, data survives on the other

RAID 5 (Striping + Distributed Parity):

RAID 5 - Striping + Distributed Parity
1 Disk Fault Tolerance
Disk 1
A
D
Gp
Disk 2
B
Dp
H
Disk 3
Cp
E
I
Data
Parity (p)
One disk can fail, rebuild data from parity

RAID 6 (Double Parity):

RAID 6 - Double Parity
2 Disk Fault Tolerance
Disk 1
A
E
P
Q
Disk 2
B
P
Q
F
Disk 3
P
Q
C
G
Disk 4
Q
D
H
P
Data
Parity P
Parity Q
Two disks can fail simultaneously, better for large arrays

What Is Power Redundancy?

Power redundancy ensures systems survive power failures.

Power Redundancy Components:

ComponentProtection Against
UPSShort outages, power conditioning
GeneratorExtended outages
Dual power suppliesPSU failure
Dual power feedsUtility/circuit failure
PDU redundancyPower distribution failure

Power Redundancy Architecture:

Power Redundancy Architecture
Utility Feed A
ATS A
UPS A
PDU A
Utility Feed B
ATS B
UPS B
PDU B
Server with Dual PSU
Dual power paths ensure server survives any single power component failure

What Are Single Points of Failure (SPOF)?

A SPOF is any component whose failure causes system failure.

Common SPOFs:

LayerPotential SPOFSolution
ComputeSingle serverClustering
NetworkSingle switchRedundant switches
NetworkSingle ISPMultiple ISPs
StorageSingle diskRAID
PowerSingle UPSRedundant UPS
FacilitySingle data centerGeographic distribution

SPOF Analysis Process:

For each component, ask:
"If this fails, does the system fail?"

If yes → It's a SPOF → Add redundancy
If no → Not a SPOF (but verify)

What Is N+1 vs 2N Redundancy?

N+1 Redundancy:

N = number needed for full capacity
+1 = one spare

Example: Need 4 servers for load
N+1 = 5 servers deployed
One can fail, still have full capacity

2N Redundancy:

2N = double everything

Example: Need 4 servers for load
2N = 8 servers deployed
Half can fail, still have full capacity

Comparison:

ModelCostRedundancyUse Case
N+1LowerSingle failureMost applications
2NHigherMultiple failuresCritical systems
2N+1HighestMaximumMission critical

How CompTIA Tests This

Example Analysis

Scenario: A company's database server has experienced data loss due to a hard drive failure. They want to implement storage redundancy that can survive a single disk failure while maintaining good read performance. They have 4 drives available.

Analysis - RAID Selection:

Requirements:

  • Survive single disk failure ✓
  • Good read performance ✓
  • 4 drives available ✓

RAID Options Analysis:

LevelFault TolerancePerformanceUsable Capacity
RAID 0❌ NoneBest100% (4 drives)
RAID 1✓ 1 diskGood read50% (2 drives)
RAID 5✓ 1 diskGood read75% (3 drives)
RAID 6✓ 2 disksGood read50% (2 drives)
RAID 10✓ 1/mirrorBest read50% (2 drives)

Recommendation: RAID 5 or RAID 10

RAID 5 Analysis:

4 drives in RAID 5:
- Usable capacity: 3 drives worth
- Fault tolerance: 1 drive failure
- Read performance: Good (striped reads)
- Write performance: Moderate (parity calculation)

Pros: Best capacity utilization
Cons: Slower writes, rebuild stress

RAID 10 Analysis:

4 drives in RAID 10:
- Usable capacity: 2 drives worth
- Fault tolerance: 1 drive per mirror
- Read performance: Excellent
- Write performance: Good

Pros: Best performance, faster rebuild
Cons: 50% capacity loss

Decision Matrix:

PriorityBest Choice
CapacityRAID 5
PerformanceRAID 10
BalancedEither works

Final Recommendation:

  • For a database with read performance priority, RAID 10 is recommended. For maximum capacity, RAID 5 is appropriate.

Key insight: RAID 0 provides NO fault tolerance—never use for important data. RAID 1 and 10 provide best performance. RAID 5 and 6 provide best capacity efficiency.

Key Terms

redundancyfault toleranceRAIDNIC teamingpower redundancyserver redundancySPOFfailover

Common Mistakes

RAID 0 for critical data—RAID 0 has NO redundancy. Any drive failure = total data loss.
RAID is not backup—RAID protects against drive failure only. It doesn't protect against ransomware, accidental deletion, or corruption.
Single redundancy layer—redundancy at one layer isn't enough. A redundant server is useless if the network has no redundancy.
Untested redundancy—redundant systems must be tested. Failover that has never been tested often doesn't work.

Exam Tips

RAID 0 = striping = performance only = NO fault tolerance. If exam asks about redundancy, RAID 0 is wrong.
RAID 1 = mirroring = 2 disks = 1 can fail. RAID 5 = striping + parity = 3+ disks = 1 can fail.
RAID 6 = double parity = 4+ disks = 2 can fail. Use for large arrays where simultaneous failures are likely.
NIC teaming/bonding provides both redundancy (failover) and potentially increased bandwidth.
N+1 = one spare beyond minimum. 2N = double everything. 2N is more expensive but more resilient.
SPOF = Single Point Of Failure. HA design eliminates all SPOFs through redundancy.

Memory Trick

RAID Level Memory:

"RAID 0 = Zero protection" (striping only) "RAID 1 = One mirror copy" (mirroring) "RAID 5 = Five fingers, lose one, still function" (1 parity) "RAID 6 = Six-shooter, two bullets spare" (2 parity) "RAID 10 = 1 and 0 combined" (mirror + stripe)

RAID Quick Reference: ``` 0 = Fast, no safety net 1 = Mirror image 5 = One disk can die 6 = Two disks can die 10 = Best of both worlds ```

  • Network Redundancy - "SNID":
  • Switches (redundant)
  • NICs (teamed)
  • ISPs (multiple)
  • Diverse paths

N+1 vs 2N: "N+1 = 1 extra spare" "2N = 2x everything (double)"

SPOF Detection: "If it's Single, it's a Problem, Or it'll Fail" Single Point Of Failure

Test Your Knowledge

Q1.Which RAID level provides NO fault tolerance?

Q2.A company needs storage that can survive two simultaneous disk failures. Which RAID level should they use?

Q3.What technology combines multiple network interfaces for redundancy and increased bandwidth?

Want more practice with instant AI feedback?

Continue Learning

Ready for the Exam?

See exactly where you stand on this concept and 182 others.

99% pass rate · Pass guarantee