Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Understanding and Reasoning on PIQA, ARC-e, HellaSwag, GPQA, Lambada, MMLU, and BBH Suite
Loading...
50.67
PIQA
DAG-MoE-l
47.394
48.2445
49.095
49.9455
May 31, 2026
PIQA
ARC-e
HellaSwag
GPQA
Lambada
MMLU
BBH
Average Score
Updated 1d ago
Evaluation Results
Method
Method
Links
PIQA
ARC-e
HellaSwag
GPQA
Lambada
MMLU
BBH
Average Score
DAG-MoE-l
Fine-tuning=Instructio...
2026.05
50.67
25.57
25.73
27.78
11.57
24.03
17.55
26.13
MoE-l
Fine-tuning=Instructio...
2026.05
47.52
24.34
25.9
21.72
8.11
24.17
16.65
24.06
Feedback
Search any
task
Search any
task