Harness Quality Comparison

Scope

This page compares the harnesses in the source set on the engineering qualities their public materials emphasize: implementation rigor, resumability, evaluation discipline, persistence, and orchestration power. It is qualitative rather than benchmark-numerical; the source corpus here is better at architecture than leaderboard theater.

Comparison table

SystemPrimary quality betWhere it looks strongestTypical weakness
codex-cliClean protocol architecture and repo legibilityReusable harness core, App Server, enforcement of taste and invariantsLess emphasis on lifelong personal memory
claude-codeLong-running task coherence through explicit handoffs and evaluatorsRecovery from context loss, UI validation, structured progress, policy-aware orchestrationHigher operator and compute overhead
hermes-agentPersistent usefulness through memory and skill accumulationLong-term recall, profiles, self-improvement loop, reusable API surfacePublic architecture looks less protocol-clean than Codex
gas-townThroughput through factory orchestrationLarge-swarm coordination, explicit work graphs, PR governanceCost, rough edges, operator complexity
gas-cityModular orchestration primitivesCustom topologies and federated work exchangeEarly-stage instability and migration risk
openclawEcosystem breadthIntegrations, public skill marketplace, cross-channel presenceSecurity and supply-chain exposure

Synthesis

No single system dominates all dimensions because they are not optimizing the same objective. Codex prioritizes architectural cleanliness, Claude prioritizes coherent long-running execution, Hermes prioritizes durable learning, Gas Town/Gas City prioritize factory-scale coordination, and OpenClaw prioritizes reach.

Practical verdict

  • If you want a reference architecture for a coding harness, start with codex-cli.
  • If you need reliable long-running implementation with explicit QA, study claude-code.
  • If you want a persistent assistant that compounds value over time, look at hermes-agent.
  • If your taste runs to swarms and ledgers, the Yegge line of gas-town and gas-city is the interesting frontier.

This page pairs with harness-architecture-comparison and harness-decision-matrix, and draws heavily on harness-engineering, context-engineering, and evaluation-and-review-loops.