GAIA: a benchmark for General AI Assistants
Source: arXiv Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom Date: 2023-11-21 Primary category: cs.CL All categories: cs.CL, cs.AI
Abstract
GAIA is not an RL gym, but it is an important adjacent benchmark for general assistants requiring reasoning, multimodality, browsing, and tool use. It belongs in the picture because it tests broad assistant competence even when it does not provide the same resettable environment structure as a gym.