Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on FLenQA 2000 tokens
Loading...
52.5
Accuracy
LIME+1
39.812
43.106
46.4
49.694
Dec 8, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
LIME+1
Model Size=2B
2025.12
52.5
LIME+1
Size=2B, Matching crit...
2025.12
52.5
LIME+1
Model Size=2B
2025.12
52.5
LIME
Model Size=2B
2025.12
44.3
LIME
Size=2B, Matching crit...
2025.12
44.3
LIME
Model Size=2B
2025.12
44.3
Base
Model Size=2B
2025.12
40.3
Baseline
Size=2B, Matching crit...
2025.12
40.3
Base (DCLM-BASELINE)
Model Size=2B
2025.12
40.3
Feedback
Search any
task
Search any
task