Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on FLenQA 250 tokens
Loading...
80
Accuracy
LIME+1
19.68
35.34
51
66.66
Dec 8, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
LIME+1
Model Size=2B
2025.12
80
LIME+1
Size=2B, Matching crit...
2025.12
80
LIME+1
Size=1B, Matching crit...
2025.12
80
LIME+1
Model Size=2B
2025.12
80
LIME+1
Size=500M, Matching cr...
2025.12
70
LIME
Model Size=2B
2025.12
52
LIME
Size=2B, Matching crit...
2025.12
52
LIME
Model Size=2B
2025.12
52
Base
Model Size=2B
2025.12
42
Baseline
Size=2B, Matching crit...
2025.12
42
Base (DCLM-BASELINE)
Model Size=2B
2025.12
42
LIME
Size=500M, Matching cr...
2025.12
40
Baseline
Size=1B, Matching crit...
2025.12
36
LIME
Size=1B, Matching crit...
2025.12
28
Baseline
Size=500M, Matching cr...
2025.12
22
Feedback
Search any
task
Search any
task