Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents

Source: arXiv Authors: Haochen Sun, Shuwen Zhang, Lujie Niu, Lei Ren, Hao Xu, Hao Fu, Fangkun Zhao, Caixia Yuan, Xiaojie Wang Date: 2025-02-27 Primary category: cs.CL All categories: cs.CL, cs.AI, cs.MA

Abstract

Collab-Overcooked studies collaboration under interactive pressure instead of treating teamwork as a static exchange of messages. The benchmark adds process-oriented metrics to Overcooked-style tasks and finds that many models can interpret goals but still struggle with active collaboration and continuous adaptation once execution begins. That makes it a good diagnostic paper for the claim that coordination strategy, not only model intelligence, remains a live bottleneck.

Agent Harness Wiki

Browse

Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents

Abstract

Graph View

Table of Contents

Backlinks