Out-of-Band Management

What It Is

Out-of-band (OOB) management is the ability to manage a machine through a channel that is independent of the machine's main OS. When the OS is hung, crashed, or not yet installed, OOB management still works.

"Out-of-band" means "not through the normal path." The normal path (SSH, web console) requires the OS to be running. OOB works even when the OS doesn't.

Why It Matters

Machines fail. Kernels panic, disks fill up, network configs go wrong. When the OS is unresponsive, you need a way to:

Power cycle the machine (without walking to it)
See the console output (to diagnose boot failures)
Boot from an alternative image (to recover)

Without OOB, recovery requires physical access. For a VPS, that means a support ticket. For a homelab server in another room, that means walking over. For a machine in a data center, that means a drive to the facility.

The Technologies

Intel AMT (Active Management Technology)

Part of Intel ME, available on vPro-branded Intel chips. Provides:

Remote power control (on/off/reset/boot-to-PXE)
KVM remote desktop (full keyboard/video/mouse, works during BIOS)
Serial over LAN (remote serial console)
IDE redirection (mount a remote ISO as a virtual CD)

AMT communicates on ports 16992/16993 (HTTP/HTTPS) using SOAP/WS-MAN. It has its own TLS stack and credentials, independent of the OS.

Availability: Intel vPro business/enterprise chips only. Not on consumer chips (Core i3/i5/i7 without vPro).

AMD DASH

DASH (Desktop and mobile Architecture for System Hardware) is a DMTF standard for client/desktop OOB management. AMD's answer to Intel AMT, available on AMD PRO-branded business platforms.

Provides: remote power control, hardware inventory, health monitoring, OS state detection. Less feature-rich than AMT (no KVM remote desktop in most implementations).

Availability: AMD PRO business platforms only.

IPMI / BMC

IPMI (Intelligent Platform Management Interface) is the server-equivalent of AMT. A BMC (Baseboard Management Controller) is a dedicated processor on the server motherboard that provides:

Remote power/reset
Serial console
Virtual media (remote ISO mount)
Hardware sensors (temperature, fan speed, voltage)
KVM over IP (web-based remote console)

Common BMC implementations: iLO (HP), iDRAC (Dell), IMM (Lenovo), Redfish (modern standard replacing IPMI).

Availability: Server hardware only. Not available on consumer/desktop.

VPS Provider Console

Cloud/VPS providers offer their own OOB via the management dashboard:

Serial console (via web browser)
Rescue mode (boot into a recovery OS)
Hard reset
ISO mounting

This is provider-specific, not a hardware standard.

How FortrOS Uses It

FortrOS treats OOB management as an org-policy-controlled resource:

Policy: use

During enrollment, the maintainer provisions AMT/DASH (sets admin credentials over the local network)
Credentials are stored encrypted in org state
The org can power-cycle unresponsive nodes via AMT
Self-healing: detect failure via gossip -> AMT reboot -> node re-enrolls -> org heals without human intervention

Policy: neutralize

Intel ME is neutralized (me_cleaner / HAP bit) during provisioning
No OOB management available
Recovery requires physical access
Appropriate for government/opsec deployments where the ME is considered an unacceptable risk

Policy: ignore (default)

Don't touch the management engine
Don't provision AMT
OOB is whatever the vendor shipped (may or may not work)
Appropriate for homelab where ME is a non-concern

The Self-Healing Loop

When AMT is available and provisioned:

1. SWIM gossip detects node-X is unreachable (failed probes)
2. Maintainer on node-Y sends AMT power-reset to node-X
3. Node-X reboots from local preboot UKI
4. Preboot authenticates via TPM, unlocks /persist, kexec
5. Node-X rejoins the gossip mesh
6. Org state converges, workloads re-reconcile

No human involved. No physical access. The org detected the failure and healed itself.

Additional Recovery Channels

OOB isn't a single mechanism -- FortrOS uses every available recovery channel, taking positive control of each when possible. These aren't alternatives to each other; they're layers that apply in different environments:

Hardware watchdog: A kernel watchdog timer that resets the machine if the OS hangs. Built into most hardware. Always active, no external dependency. Detects OS hangs but not network issues or boot failures.

PDU (Power Distribution Unit) control: Smart PDUs remotely toggle power to individual outlets. Coarser than AMT (power cycle, no console) but works with any hardware. FortrOS provisions PDU credentials during enrollment where available.

Wake-on-LAN (WoL): Send a magic packet to a machine's MAC address to power it on. Only powers on -- can't reset or access console. Requires L2 network access. Useful for bringing up nodes after planned power-downs.

VPS provider API: Cloud providers offer hard reset, serial console, rescue mode, and ISO mounting via API. FortrOS uses provider APIs for automated recovery of VPS nodes.

VM host management: Nodes running as VMs on Proxmox, vSphere, or similar can be power-cycled via the hypervisor's management API. FortrOS provisions API credentials during enrollment for automated recovery.

The principle: every management channel the hardware or environment offers is a recovery path FortrOS can use. The maintainer's OOB module detects what's available (AMT, IPMI, provider API, PDU, WoL) and uses the best option for each node.