Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-horizon task execution on LongAct Bench (detailed split)

59GC (Avg)

GPT-5

-1.590414.139829.8745.6002May 14, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.05
5966.782.975.260161,98225.31.7
2026.05
51.252.771.273.425151,69228.71.61
2026.05
38.440.558.966.72592,30421.50.81
2026.05
24.527.45548.731.331,89630.60.99
2026.05
7.167.048.7737.8008444.810.59
2026.05
6.143.5418.87.32002,9813.1-0.32
2026.05
0.740.156.930002,5980.26-0.08