GAIA: a benchmark for General AI Assistants

Source: arXiv Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom Date: 2023-11-21 Primary category: cs.CL All categories: cs.CL, cs.AI

Abstract

GAIA is not an RL gym, but it is an important adjacent benchmark for general assistants requiring reasoning, multimodality, browsing, and tool use. It belongs in the picture because it tests broad assistant competence even when it does not provide the same resettable environment structure as a gym.

Agent Harness Wiki

Browse

GAIA: a benchmark for General AI Assistants

Abstract

Graph View

Table of Contents

Backlinks