WorfBench

Overview

WorfBench is the benchmark side of a graph-aware workflow-evaluation stack for agentic workflow generation. It measures workflow quality as structure, not only as end-task output.

Why it matters

It matters because workflow systems need evaluation surfaces that understand graphs and procedures. Otherwise the benchmark quietly rewards outcomes while ignoring the workflow artifact it claimed to study.

Distinctive trait

Its distinctive trait is graph-aware benchmarking for workflows as workflows, rather than treating them as just another blob of generated text.

Relationships

Read WorfBench with WorfEval, SOPBench, and evaluation-and-review-loops. It is also central to the evidence lane in self-evolving-workflows.