Failure detector

Source: Wikipedia Topic: Suspicion and crash detection as graded signals rather than perfect truth.

Core idea

A failure detector is a subsystem that reports suspected failures, with different formal strengths of completeness and accuracy rather than magical certainty.

Key claims

Distributed systems rarely know failure perfectly.
Suspicion quality matters because different guarantees require different detector strengths.
Timeout is not the same thing as fact.

Harness takeaway

The studio should represent agent and service status as suspected, unreachable, timed out, or confirmed failed, instead of collapsing all delay into death.

Agent Harness Wiki

Browse

Failure detector

Core idea

Key claims

Harness takeaway

Graph View

Table of Contents

Backlinks