ProSoftArena

Benchmarking Hierarchical Capabilities of Multimodal Agents in Professional Software Environments

Jiaxin Ai, Yukang Feng, Fanrui Zhang, Jianwen Sun, Zizhen Li, Chuanhao Li,
Yifan Chang, Wenxiao Wu, Ruoxi Wang, Mingliang Zhai, Kaipeng Zhang†

†Corresponding Author: zhangkaipeng@pjlab.org.cn

Paper Code 🤗 ProSoftArena

🌈 ProSoftArena. We establish the first hierarchical taxonomy of agent capabilities in professional software environ- ments; and curate a comprehensive benchmark covering 6 disciplines, 20 subfields and 13 core professional applications. We construct a VM-based real computer environment for reproductible evaluations, and uniquely incorporates a human-in-the-loop evalution paradigm.