Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Operating System Control on AgentBench OS
Loading...
34.7
Accuracy
OpenHands CodeActAgent
1.524
10.137
18.75
27.363
Oct 28, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
OpenHands CodeActAgent
Model=Qwen-2.5-32B-Cod...
2025.10
34.7
OpenHands CodeActAgent
Model=Qwen-2.5-32B-Cod...
2025.10
27.8
OpenHands CodeActAgent
Model=Qwen-2.5-7B-Code...
2025.10
27.1
AgentLM
Model=Llama-2-chat-70B...
2025.10
21.5
OpenHands CodeActAgent
Model=Qwen-2.5-14B-Cod...
2025.10
20.8
AgentLM
Model=Llama-2-chat-13B...
2025.10
18.1
AgentLM
Model=Llama-2-chat-7B,...
2025.10
17.4
AgentLM
Model=Llama-2-chat-70B...
2025.10
9
AgentLM
Model=Llama-2-chat-13B...
2025.10
9
AgentLM
Model=Llama-2-chat-7B,...
2025.10
8.3
OpenHands CodeActAgent
Model=Qwen-2.5-7B-Code...
2025.10
3.5
OpenHands CodeActAgent
Model=Qwen-2.5-14B-Cod...
2025.10
2.8
Feedback
Search any
task
Search any
task