AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Source: arXiv Authors: Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong Date: 2024-01-24 Primary category: cs.CL All categories: cs.CL, cs.AI, cs.LG

Abstract

AgentBoard is less a gym than an analytical evaluation board spanning multiple environments and progress metrics. It is useful because it reminds us that not every harness substrate must itself be trainable; some are interpretive score surfaces over many tasks.

Agent Harness Wiki

Browse

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

Abstract

Graph View

Table of Contents

Backlinks