Safety and Permissions

Definition

Safety and permissions are the controls that determine what an agent may execute, when a human must approve a step, and how the runtime limits damage when the agent is wrong or the ecosystem is hostile. A capable model without execution boundaries is not autonomy; it is simply uncontained power.

Main control surfaces

Sandboxes and execution boundaries around shell, file, and network tools, as emphasized by codex-cli and the codex-app-server.
Approval checkpoints that pause the turn until a human authorizes risky work.
Hierarchical allow/deny policy rules and managed settings, which Anthropic now treats as deployable harness policy rather than optional prompt advice.
Hardened runtimes and pre-execution scanning, which the hermes-agent material treats as first-class operating concerns.
Skill-registry and integration hygiene, where openclaw demonstrates how ecosystem breadth can expand the blast radius.

Typical failure modes

Malicious or low-quality third-party skills entering the execution path.
Credential leakage through unsafe transports or over-privileged tooling.
Destructive commands becoming too easy because the runtime exposes more power than the task requires.
Defensive paralysis, where the harness is so opaque or restrictive that the agent stops validating reality and merely narrates intent.

Design lesson

Safety is not only a policy layer glued on top of tools. It is part of agent-harness-anatomy and harness-engineering because permission boundaries, warning surfaces, managed settings, and remediation hints shape how every later turn behaves. Good harnesses make denial legible: the agent should learn what was blocked, why, and what safer path remains.

Read this page with agent-harness-anatomy, harness-engineering, instruction-layering, automation-and-background-work, codex-cli, hermes-agent, and openclaw. The trade-offs here also inform harness-architecture-comparison and context-engineering.

Agent Harness Wiki

Browse

Safety and Permissions

Definition

Main control surfaces

Typical failure modes

Design lesson

Graph View

Table of Contents

Backlinks

Agent Harness Wiki

Browse

Safety and Permissions

Definition

Main control surfaces

Typical failure modes

Design lesson

Related pages

Graph View

Table of Contents

Backlinks