Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on FLenQA 1000 tokens
Loading...
78.5
Accuracy
LIME+1
9.86
27.68
45.5
63.32
Dec 8, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
LIME+1
Size=1B, Matching crit...
2025.12
78.5
LIME+1
Size=500M, Matching cr...
2025.12
72.5
LIME+1
Model Size=2B
2025.12
65.3
LIME+1
Size=2B, Matching crit...
2025.12
65.3
LIME+1
Model Size=2B
2025.12
65.3
LIME
Model Size=2B
2025.12
47
LIME
Size=2B, Matching crit...
2025.12
47
LIME
Model Size=2B
2025.12
47
Base
Model Size=2B
2025.12
34.8
Baseline
Size=2B, Matching crit...
2025.12
34.8
Base (DCLM-BASELINE)
Model Size=2B
2025.12
34.8
LIME
Size=1B, Matching crit...
2025.12
33
Baseline
Size=1B, Matching crit...
2025.12
32.4
LIME
Size=500M, Matching cr...
2025.12
30.5
Baseline
Size=500M, Matching cr...
2025.12
12.5
Feedback
Search any
task
Search any
task