Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning and Multitask Language Understanding on OBQA, ARC, Riddle, PQA, and MMLU
Loading...
77
OBQA Accuracy
Similarity-based Router
49.336
56.518
63.7
70.882
Feb 1, 2026
OBQA Accuracy
ARC Accuracy
Riddle Accuracy
PQA Accuracy
MMLU Accuracy
Average Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
OBQA Accuracy
ARC Accuracy
Riddle Accuracy
PQA Accuracy
MMLU Accuracy
Average Accuracy
Similarity-based Router
Student Model=FLAN-T5...
2026.02
77
60.17
70.78
62.75
55.26
65.19
Distilling-Step-by-Step
Student Model=FLAN-T5...
2026.02
71.6
60.77
68.43
51
51.84
60.73
TinyLLM
Student Model=FLAN-T5...
2026.02
70.4
54.25
67.25
52.75
49.28
58.79
Inference
Student Model=FLAN-T5...
2026.02
50.4
51.07
39.8
45.5
45.1
46.37
Feedback
Search any
task
Search any
task