GUIDE · PREVIEW
GUIDE / CON.18
source: docs/guide/concepts/Erasure Coding.md
Concepts

Erasure Coding

What It Is

Erasure coding is a data protection technique that splits data into fragments, generates redundancy fragments, and distributes them across storage nodes. Any subset of fragments (K out of N) can reconstruct the original data. The remaining N-K fragments are pure redundancy.

Think of it as a more efficient version of RAID: instead of storing 2 or 3 complete copies of the data (2-3x storage overhead), erasure coding achieves similar or better fault tolerance at 1.5-2x overhead.

Why It Matters

In a distributed system, nodes fail. Disks die, machines go offline, network partitions isolate nodes. If your data is on one node and that node fails, the data is lost.

Options for protecting data:

  • Replication (copies): Store 3 copies on 3 nodes. Any 1 copy recovers the data. 3x storage overhead. Simple but wasteful.
  • Erasure coding: Split into 4 data fragments + 2 parity fragments (K=4, N=6). Any 4 of 6 fragments recover the data. 1.5x storage overhead. More complex but much more efficient.

How It Works

Reed-Solomon Coding

FortrOS uses Reed-Solomon erasure coding (via the reed-solomon-erasure Rust crate). The math is based on polynomial evaluation over Galois fields:

  1. Split: Data is divided into K equal-size data shards
  2. Encode: N-K parity shards are generated (each is a linear combination of the data shards)
  3. Distribute: All N shards are stored on different nodes
  4. Reconstruct: Any K of the N shards are sufficient to recover the original data

Example with K=4, N=6:

Original data: [D1, D2, D3, D4]
Parity shards: [P1, P2]  (computed from D1-D4)

Distribute: Node A gets D1, Node B gets D2, ..., Node F gets P2

If Nodes C and F fail (D3 and P2 lost):
  Reconstruct from D1, D2, D4, P1 (any 4 of 6)
  -> Original data recovered

Storage Efficiency

Scheme Copies/fragments Tolerance Overhead
3x replication 3 copies Any 2 failures 3.0x
RS(4,6) 4 data + 2 parity Any 2 failures 1.5x
RS(4,8) 4 data + 4 parity Any 4 failures 2.0x
RS(10,14) 10 data + 4 parity Any 4 failures 1.4x

Erasure coding gives the same fault tolerance as replication with less storage. The tradeoff: reconstruction requires computation (reading K shards and doing math), while replication just reads a copy.

How FortrOS Uses It

Shard storage: Files stored in the org are chunked, erasure-coded, and distributed across nodes as encrypted shards (slivers). Each node stores opaque encrypted fragments it cannot read (zero-knowledge to the host).

Transparent encryption: The reconciler (which sets up the app's storage) encrypts data before passing it to the placement service. The app writes plaintext to its mounted directory; the reconciler handles encryption underneath. The placement service stores and retrieves opaque bytes -- it holds no keys. Even if all shards are collected, they're useless without the owning service's key from the key service.

Reliability-weighted placement: The placement service tracks which nodes hold which shards, and scores nodes by disk health and availability. The scoring accounts for device reliability: a shard on a disk with 50% reliability scores as 0.5 shards toward the replication target. If the org wants N effective shards available, it places more shards on unreliable devices to compensate -- two shards on 50%-reliable disks provide the same effective availability as one shard on a 100%-reliable disk.

This means old and unreliable hardware is usable, not just tolerated. A node with aging drives contributes less per-shard but still contributes. The placement system distributes shards so that the probability of losing enough shards simultaneously stays below the target, regardless of whether those shards are on a few reliable devices or scattered across many unreliable ones. Under-replicated data is re-replicated to healthy nodes automatically as device scores change.

Alternatives

Replication: Store N complete copies. Simpler, faster reads (any copy works), but N times the storage cost. Used by Ceph (default 3x), HDFS, and most databases.

RAID (local): Erasure coding within a single machine (RAID 5, RAID 6). Protects against disk failure, not node failure. FortrOS doesn't use local RAID -- protection is at the org level via distributed erasure coding.

Links