Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Terminal-related CLI agent task on TerminalBench 1.0

54.38Accuracy

GPT-5.2

3.26416.534529.80543.0755Apr 21, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
54.38-
2026.04
53.72-
2026.04
51-
2026.04
48.75-
2026.04
47.5-
2026.04
46.35-
2026.04
46.25-
2026.04
45.2523.43
2026.04
45.25-
2026.04
44.59-
2026.04
43.93-
2026.04
42.5-
2026.04
42.329.74
2026.04
42.3-
2026.04
38.5-
2026.04
37.5-
2026.04
35.5-
2026.04
31.56-
2026.04
28.75-
2026.04
25-
2026.04
23.8-
2026.04
15-
2026.04
14.13-
2026.04
11.25-
2026.04
11.25-
2026.04
11.25-
2026.04
9.22-
2026.04
8.86-
2026.04
5.23-