Redundancy and Fault Tolerance
Eliminating single points of failure through server redundancy, network redundancy, storage redundancy (RAID), and power redundancy. Understanding fault tolerance mechanisms and their implementation.
Understanding Redundancy and Fault Tolerance
Redundancy duplicates critical components so that if one fails, others continue operating. Fault tolerance is the system's ability to continue functioning despite component failures. Together, they eliminate single points of failure (SPOF).
Redundancy categories: • Server redundancy — Multiple servers, clustering, virtualization • Network redundancy — Multiple paths, NIC teaming, diverse carriers • Storage redundancy — RAID arrays, replication, distributed storage • Power redundancy — UPS, generators, dual feeds
The 2021 Facebook outage occurred when a configuration change disconnected their data centers from the internet for six hours. Despite having redundant systems, the change affected ALL redundant paths simultaneously—demonstrating that redundancy must protect against common-mode failures.
True fault tolerance requires redundancy at every layer and across failure domains.
Why This Matters for the Exam
Redundancy and fault tolerance are heavily tested on SY0-701 because they're fundamental to system resilience. Questions cover RAID levels, network redundancy, and identifying SPOFs.
Understanding redundancy helps with infrastructure design, disaster recovery, and risk assessment. Systems without redundancy are single incidents away from outage.
The exam tests both conceptual understanding and specific technologies like RAID.
Deep Dive
What Is Server Redundancy?
Server redundancy ensures compute capacity survives individual server failures.
Server Redundancy Methods:
| Method | Description | Failover Time |
|---|---|---|
| Clustering | Multiple servers as one | Seconds |
| Virtualization | VMs migrate between hosts | Minutes |
| Standby servers | Cold/warm/hot standby | Varies |
| Cloud auto-scaling | Automatic replacement | Minutes |
Virtualization HA:
What Is Network Redundancy?
Network redundancy provides multiple paths for traffic to flow.
Network Redundancy Technologies:
| Technology | Function |
|---|---|
| NIC teaming/bonding | Multiple NICs as one |
| Redundant switches | Multiple switch paths |
| Diverse routing | Multiple network paths |
| Multiple ISPs | Different internet providers |
| SD-WAN | Multiple transport options |
NIC Teaming Modes:
| Mode | Description | Benefit |
|---|---|---|
| Active-Backup | One active, one standby | Failover |
| Load Balancing | Traffic across all NICs | Performance + failover |
| LACP (802.3ad) | Link aggregation | Bandwidth + failover |
Network Redundancy Architecture:
What Are RAID Levels for Storage Redundancy?
RAID (Redundant Array of Independent Disks) provides storage redundancy and/or performance.
Common RAID Levels:
| Level | Description | Min Disks | Fault Tolerance |
|---|---|---|---|
| RAID 0 | Striping | 2 | None (performance only) |
| RAID 1 | Mirroring | 2 | 1 disk failure |
| RAID 5 | Striping + parity | 3 | 1 disk failure |
| RAID 6 | Striping + double parity | 4 | 2 disk failures |
| RAID 10 | Mirroring + striping | 4 | 1 disk per mirror |
RAID Diagrams:
RAID 0 (Striping - NO redundancy):
RAID 1 (Mirroring):
RAID 5 (Striping + Distributed Parity):
RAID 6 (Double Parity):
What Is Power Redundancy?
Power redundancy ensures systems survive power failures.
Power Redundancy Components:
| Component | Protection Against |
|---|---|
| UPS | Short outages, power conditioning |
| Generator | Extended outages |
| Dual power supplies | PSU failure |
| Dual power feeds | Utility/circuit failure |
| PDU redundancy | Power distribution failure |
Power Redundancy Architecture:
What Are Single Points of Failure (SPOF)?
A SPOF is any component whose failure causes system failure.
Common SPOFs:
| Layer | Potential SPOF | Solution |
|---|---|---|
| Compute | Single server | Clustering |
| Network | Single switch | Redundant switches |
| Network | Single ISP | Multiple ISPs |
| Storage | Single disk | RAID |
| Power | Single UPS | Redundant UPS |
| Facility | Single data center | Geographic distribution |
SPOF Analysis Process:
For each component, ask: "If this fails, does the system fail?" If yes → It's a SPOF → Add redundancy If no → Not a SPOF (but verify)
What Is N+1 vs 2N Redundancy?
N+1 Redundancy:
N = number needed for full capacity +1 = one spare Example: Need 4 servers for load N+1 = 5 servers deployed One can fail, still have full capacity
2N Redundancy:
2N = double everything Example: Need 4 servers for load 2N = 8 servers deployed Half can fail, still have full capacity
Comparison:
| Model | Cost | Redundancy | Use Case |
|---|---|---|---|
| N+1 | Lower | Single failure | Most applications |
| 2N | Higher | Multiple failures | Critical systems |
| 2N+1 | Highest | Maximum | Mission critical |
How CompTIA Tests This
Example Analysis
Scenario: A company's database server has experienced data loss due to a hard drive failure. They want to implement storage redundancy that can survive a single disk failure while maintaining good read performance. They have 4 drives available.
Analysis - RAID Selection:
Requirements:
- •Survive single disk failure ✓
- •Good read performance ✓
- •4 drives available ✓
RAID Options Analysis:
| Level | Fault Tolerance | Performance | Usable Capacity |
|---|---|---|---|
| RAID 0 | ❌ None | Best | 100% (4 drives) |
| RAID 1 | ✓ 1 disk | Good read | 50% (2 drives) |
| RAID 5 | ✓ 1 disk | Good read | 75% (3 drives) |
| RAID 6 | ✓ 2 disks | Good read | 50% (2 drives) |
| RAID 10 | ✓ 1/mirror | Best read | 50% (2 drives) |
Recommendation: RAID 5 or RAID 10
RAID 5 Analysis:
4 drives in RAID 5: - Usable capacity: 3 drives worth - Fault tolerance: 1 drive failure - Read performance: Good (striped reads) - Write performance: Moderate (parity calculation) Pros: Best capacity utilization Cons: Slower writes, rebuild stress
RAID 10 Analysis:
4 drives in RAID 10: - Usable capacity: 2 drives worth - Fault tolerance: 1 drive per mirror - Read performance: Excellent - Write performance: Good Pros: Best performance, faster rebuild Cons: 50% capacity loss
Decision Matrix:
| Priority | Best Choice |
|---|---|
| Capacity | RAID 5 |
| Performance | RAID 10 |
| Balanced | Either works |
Final Recommendation:
- •For a database with read performance priority, RAID 10 is recommended. For maximum capacity, RAID 5 is appropriate.
Key insight: RAID 0 provides NO fault tolerance—never use for important data. RAID 1 and 10 provide best performance. RAID 5 and 6 provide best capacity efficiency.
Key Terms
Common Mistakes
Exam Tips
Memory Trick
RAID Level Memory:
"RAID 0 = Zero protection" (striping only) "RAID 1 = One mirror copy" (mirroring) "RAID 5 = Five fingers, lose one, still function" (1 parity) "RAID 6 = Six-shooter, two bullets spare" (2 parity) "RAID 10 = 1 and 0 combined" (mirror + stripe)
RAID Quick Reference: ``` 0 = Fast, no safety net 1 = Mirror image 5 = One disk can die 6 = Two disks can die 10 = Best of both worlds ```
- •Network Redundancy - "SNID":
- •Switches (redundant)
- •NICs (teamed)
- •ISPs (multiple)
- •Diverse paths
N+1 vs 2N: "N+1 = 1 extra spare" "2N = 2x everything (double)"
SPOF Detection: "If it's Single, it's a Problem, Or it'll Fail" Single Point Of Failure
Test Your Knowledge
Q1.Which RAID level provides NO fault tolerance?
Q2.A company needs storage that can survive two simultaneous disk failures. Which RAID level should they use?
Q3.What technology combines multiple network interfaces for redundancy and increased bandwidth?
Want more practice with instant AI feedback?
Continue Learning
Ready for the Exam?
See exactly where you stand on this concept and 182 others.
99% pass rate · Pass guarantee