Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Evaluation on Full task set (n=45)

8.84Overall Score

Single Agent

7.70648.00078.2958.5893Mar 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
8.849.4299
2026.03
8.788.668.679.44
2026.03
8.658.378.539.32
2026.03
8.248.848.628.96
7.758.397.728.76