OSWorld
Overview
OSWorld is a real-computer benchmark environment for multimodal agents operating across Ubuntu, Windows, and macOS. It supports task setup, execution-based evaluation, and workflows spanning multiple applications and filesystems.
Why it matters
It matters because general computer-use agents need a world larger than one browser tab, and OSWorld is one of the best current attempts to provide that world reproducibly.
Distinctive trait
Its distinctive trait is operating-system realism with execution-based evaluation rather than screenshots of a desktop being admired from afar.
Relationships
Read OSWorld with windows-agent-arena, computer-rl, rl-gyms-and-executable-environments-for-ai-harnesses, and compare it with webarena plus appworld.