Benchmarking Agentic Workflow Generation

Source: arXiv Authors: Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen Date: 2024-10-10 Primary category: cs.CL All categories: cs.CL, cs.AI, cs.HC, cs.LG, cs.MA

Abstract

This paper introduces WorfBench and WorfEval to measure workflow-generation quality more strictly than end-to-end task success alone. Its central contribution is methodological: it evaluates graph structure, subsequence alignment, and downstream usefulness, making it a good reference for any repo that wants workflow evolution to be driven by something more civilized than vibes.

Agent Harness Wiki

Browse

Benchmarking Agentic Workflow Generation

Abstract

Graph View

Table of Contents

Backlinks