Benchmarking Agentic Workflow Generation
Source: arXiv Authors: Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen Date: 2024-10-10 Primary category: cs.CL All categories: cs.CL, cs.AI, cs.HC, cs.LG, cs.MA
Abstract
This paper introduces WorfBench and WorfEval to measure workflow-generation quality more strictly than end-to-end task success alone. Its central contribution is methodological: it evaluates graph structure, subsequence alignment, and downstream usefulness, making it a good reference for any repo that wants workflow evolution to be driven by something more civilized than vibes.