Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Quality Evaluation on 120-tool benchmark 500 tasks simulated

4.43Mean Score

Tool Attention

3.16123.49063.824.1494Apr 23, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
4.430.6287.6
2026.04
4.020.7774.1
2026.04
3.890.8168.7
2026.04
3.350.9848
2026.04
3.211.0443.2