Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Language Model Evaluation on Average of 10 tasks
Loading...
45.02
Overall Performance
T-SPIN
41.1512
42.1556
43.16
44.1644
Jan 13, 2026
Overall Performance
Iteration Delta
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall Performance
Iteration Delta
T-SPIN
Iteration=Iter4
2026.01
45.02
0.12
T-SPIN
Iteration=Iter3
2026.01
44.9
1.34
SFT
Method=Supervised Fine...
2026.01
44.17
-
T-SPIN
Iteration=Iter2
2026.01
43.56
0.24
T-SPIN
Iteration=Iter1
2026.01
43.32
0.61
T-SPIN
Iteration=Iter0
2026.01
42.71
-
SPIN
Iteration=Iter0
2026.01
42.6
-
SPIN
Iteration=Iter1
2026.01
42.52
-0.08
Mistral-7B
Configuration=Base
2026.01
42.51
-
SPIN
Iteration=Iter4
2026.01
42.32
1.02
SPIN
Iteration=Iter2
2026.01
42.3
-0.22
SPIN
Iteration=Iter3
2026.01
41.3
-1
Feedback
Search any
task
Search any
task