τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Source: arXiv Authors: Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan Date: 2024-06-17 Primary category: cs.AI All categories: cs.AI, cs.CL

Abstract

τ-bench evaluates multi-turn conversations where an agent must use tools, follow policy rules, and interact with a simulated user in dynamic domains. It is not a full RL gym in the browser or desktop sense, but it is one of the cleanest tool-agent-user interaction environments with end-state grading.