Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning on CounterBench

0.0359Error Rate

FLEx

0.0166320.1466910.276750.406809Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.0359
2026.01
0.14
2026.01
0.1535
2026.01
0.1958
2026.01
0.2414
2026.01
0.2538
2026.01
0.2678
2026.01
0.2691
2026.01
0.2706
2026.01
0.3547
2026.01
0.5176