Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Terminal task execution on Terminal-Bench 2.0 (full)
Loading...
58.9
Overall avg@5 Accuracy
SkillsVote
50.684
52.817
54.95
57.083
May 18, 2026
Overall avg@5 Accuracy
Easy avg@5 Accuracy
Medium avg@5 Accuracy
Hard avg@5 Accuracy
Updated 15d ago
Evaluation Results
Method
Method
Links
Overall avg@5 Accuracy
Easy avg@5 Accuracy
Medium avg@5 Accuracy
Hard avg@5 Accuracy
SkillsVote
Backbone=GPT-5.2, Sett...
2026.05
58.9
90
65.1
43.3
SkillsVote
Backbone=GPT-5.4 mini,...
2026.05
57.5
65
64.7
43.3
SkillsVote
Backbone=GPT-5.2, Sett...
2026.05
53.7
75
62.9
34
SkillsVote
Backbone=GPT-5.4 mini,...
2026.05
52.8
75
63.6
30
GPT-5.4 mini
Setting=Medium (No-ski...
2026.05
51.7
75
61.8
30
GPT-5.2
Setting=Medium (No-ski...
2026.05
51
75
54.9
40.7
Feedback
Search any
task
Search any
task