OAgents: An Empirical Study of Building Effective Agents

About

Recently, Agentic AI has become an increasingly popular research field. However, we argue that current agent research practices lack standardization and scientific rigor, making it hard to conduct fair comparisons among methods. As a result, it is still unclear how different design choices in agent frameworks affect effectiveness, and measuring their progress remains challenging. In this work, we conduct a systematic empirical study on GAIA benchmark and BrowseComp to examine the impact of popular design choices in key agent components in a fair and rigorous manner. We find that the lack of a standard evaluation protocol makes previous works, even open-sourced ones, non-reproducible, with significant variance between random runs. Therefore, we introduce a more robust evaluation protocol to stabilize comparisons. Our study reveals which components and designs are crucial for effective agents, while others are redundant, despite seeming logical. Based on our findings, we build and open-source OAgents, a new foundation agent framework that achieves state-of-the-art performance among open-source projects. OAgents offers a modular design for various agent components, promoting future research in Agentic AI.

He Zhu, Tianrui Qin, King Zhu, Heyuan Huang, Yeyi Guan, Jinxiang Xia, Yi Yao, Hanhao Li, Ningning Wang, Pai Liu, Tianhao Peng, Xin Gui, Xiaowan Li, Yuhui Liu, Yuchen Eleanor Jiang, Jun Wang, Changwang Zhang, Xiangru Tang, Ge Zhang, Jian Yang, Minghao Liu, Xitong Gao, Jiaheng Liu, Wangchunshu Zhou• 2025

Related benchmarks

Task	Dataset	Result
Automated Planning	PDDL	Accuracy17.12	233
Question Answering	PopQA	Accuracy48.3	186
Question Answering	StrategyQA	Accuracy65.94	123
Question Answering	TriviaQA	Accuracy73.52	117
General AI Assistant Task	GAIA (val)	Level 1 Score83	97
Code Generation	KodCode	Accuracy49.9	94
General AI Assistant Tasks	GAIA	Avg Performance66.67	77
Code Generation	BigCodeBench	Accuracy82.28	75
Agentic Web Browsing	Browsecomp	Pass@122.2	47
Data Science Agent tasks	xBench-DS	Pass@10.47	31

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord