Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-Task Skill Learning Suite (ScienceQA, MMLU, Hellaswag, Humaneval, TruthfulQA, Winogrande, IFeval)
Loading...
70.2
ScienceQA
SDFT
30.576
40.863
51.15
61.437
Jan 27, 2026
ScienceQA
Hellaswag
HumanEval
IFeval
MMLU
TruthfulQA
Winogrande
Average Score
Updated 3d ago
Evaluation Results
Method
Method
Links
ScienceQA
Hellaswag
HumanEval
IFeval
MMLU
TruthfulQA
Winogrande
Average Score
SDFT
2026.01
70.2
60.9
68.9
66.8
70.7
46.5
73.1
64.5
SFT
2026.01
66.2
55
54.8
35.3
64.6
36.8
73.7
53.4
SFT + re-invoke
2026.01
66
61.6
63.4
52.9
68.7
45.2
70
60.2
DFT
2026.01
54.8
57.6
67
60.4
69.4
38.8
68.2
60.2
Base
Backbone=Qwen2.5-7B
2026.01
32.1
62
65.8
74.3
71.7
47.9
71.1
65.5
Feedback
Search any
task
Search any
task