OpenHands: An Open Platform for AI Software Developers as Generalist Agents

About

Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig• 2024

Related benchmarks

Task	Dataset	Result
Terminal task completion	Terminal-bench 2.0	Pass@131.43	63
Automated Software Engineering	SWE-bench Verified	Resolved Rate79.8	39
Software Engineering	SWE-bench Verified	Resolution Rate77.6	32
Software Engineering	SWE-bench Verified	Success Rate71.8	31
Software Engineering Issue Resolution	SWE-bench Verified	Resolution Rate37.2	26
Overreach Evaluation	OVEREAGER	Composite-oracle Overreach Rate22.6	24
ML Engineering	MLE-Bench official (test)	Medal Rate (Low)11.5	19
Terminal-based task execution	Terminal-bench 2.0	--	19
Software Engineering	SWE-bench Verified	Pass@172.8	18
File-level Code Localization	SWE-Bench Lite	Acc@177.37	16

Showing 10 of 61 rows

Other info

Follow for update

@wizwand_team Discord