Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on FLenQA 500 tokens
Loading...
74
Accuracy
LIME+1
19.92
33.96
48
62.04
Dec 8, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
LIME+1
Size=1B, Matching crit...
2025.12
74
LIME+1
Size=500M, Matching cr...
2025.12
73.8
LIME+1
Model Size=2B
2025.12
73.5
LIME+1
Size=2B, Matching crit...
2025.12
73.5
LIME+1
Model Size=2B
2025.12
73.5
LIME
Model Size=2B
2025.12
65.3
LIME
Size=2B, Matching crit...
2025.12
65.3
LIME
Model Size=2B
2025.12
65.3
Base
Model Size=2B
2025.12
49.5
Baseline
Size=2B, Matching crit...
2025.12
49.5
Base (DCLM-BASELINE)
Model Size=2B
2025.12
49.5
LIME
Size=500M, Matching cr...
2025.12
48.8
Baseline
Size=1B, Matching crit...
2025.12
39
LIME
Size=1B, Matching crit...
2025.12
32
Baseline
Size=500M, Matching cr...
2025.12
22
Feedback
Search any
task
Search any
task