WebCanvas: Benchmarking Web Agents in Online Environments
Source: arXiv Authors: Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, et al. Date: 2024-06-18 Primary category: cs.CL All categories: cs.CL, cs.AI, cs.LG
Abstract
WebCanvas addresses the problem that websites change over time by turning web-agent evaluation into an online, continuously maintainable setup. It is relevant wherever a harness must survive interface drift instead of being evaluated on a frozen museum exhibit.