Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning and Multitask Language Understanding on OBQA, ARC, Riddle, PQA, and MMLU
Loading...
77
OBQA Accuracy
Similarity-based Router
49.336
56.518
63.7
70.882
Feb 1, 2026
OBQA Accuracy
ARC Accuracy
Riddle Accuracy
PQA Accuracy
MMLU Accuracy
Average Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
OBQA Accuracy
ARC Accuracy
Riddle Accuracy
PQA Accuracy
MMLU Accuracy
Average Accuracy
Similarity-based Router
Student Model=FLAN-T5...
2026.02
77
60.17
70.78
62.75
55.26
65.19
Distilling-Step-by-Step
Student Model=FLAN-T5...
2026.02
71.6
60.77
68.43
51
51.84
60.73
TinyLLM
Student Model=FLAN-T5...
2026.02
70.4
54.25
67.25
52.75
49.28
58.79
Inference
Student Model=FLAN-T5...
2026.02
50.4
51.07
39.8
45.5
45.1
46.37
Feedback
Search any
task
Search any
task