WorfBench
Overview
WorfBench is the benchmark side of a graph-aware workflow-evaluation stack for agentic workflow generation. It measures workflow quality as structure, not only as end-task output.
Why it matters
It matters because workflow systems need evaluation surfaces that understand graphs and procedures. Otherwise the benchmark quietly rewards outcomes while ignoring the workflow artifact it claimed to study.
Distinctive trait
Its distinctive trait is graph-aware benchmarking for workflows as workflows, rather than treating them as just another blob of generated text.
Relationships
Read WorfBench with WorfEval, SOPBench, and evaluation-and-review-loops. It is also central to the evidence lane in self-evolving-workflows.