Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TerminalBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Terminal Agentic Trajectory GenerationTerminalBench 2.0
Score57.8
29
Terminal Agentic Trajectory GenerationTerminalBench 1.0
Score56.25
23
Agentic CodingTerminalBench 2
Pass Rate81.8
17
Code GenerationTerminalBench 2
Pass@339.3
9
Agentic CodingTerminalBench
Accuracy0.3375
7
Ranking PreservationTerminalBench (test)
Mean Spearman Rho0.988
5
Terminal Agentic Trajectory GenerationTerminalBench
Pass@845
4
Showing 7 of 7 rows