Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Terminal-bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Terminal task completionTerminal-bench 2.0
Pass@164.7
43
Terminal task completionTerminal-bench 1.0
Pass@151
17
End-to-end terminal tasksTerminal-Bench 2
Score49.6
13
Terminal Capability EvaluationTerminal-Bench 2.0
Accuracy27.4
12
CodingTerminal-Bench 2.0
Score59.3
11
Agentic Terminal TasksTerminal-Bench (TB) (test)
Success Rate48.75
10
AgentTerminal-Bench
Accuracy45
8
Code Agent SimulationTerminal Bench 2.0
Accuracy54.2
6
Code AgentTerminal-Bench Hard
Score39
6
Terminal-based task executionTerminal-Bench 2.0
Resolved %65.2
5
Software Engineering Issue ResolutionTerminal Bench
Resolve Rate32.5
4
AgentTerminal Bench Hard English
Score9.9
3
AgentTerminal Bench English 1.0
Score21.8
3
Terminal Task ExecutionTerminal-Bench 1.0 (test)
Avg Pass Rate18.9
2
Terminal Capability EvaluationTerminal-Bench 2.0 (test)
Metric-
0
Showing 15 of 15 rows