Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Reasoning on xbench (test)

0.66Pass@3

ExpSeek

0.42080.48290.5450.6071Jan 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.66
2026.01
0.62
2026.01
0.59
2026.01
0.536
2026.01
0.53
2026.01
0.468
2026.01
0.45
2026.01
0.43