Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU (0-shot)
Loading...
69.11
Exact Match (EM)
VAR
30.9836
40.8818
50.78
60.6782
Feb 16, 2025
Exact Match (EM)
Updated 26d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
VAR
Base Model=Qwen2.5-7B
2025.02
69.11
DPO
Base Model=Qwen2.5-7B
2025.02
68.64
ALoL
Base Model=Qwen2.5-7B
2025.02
68.62
Base
Base Model=Qwen2.5-7B
2025.02
67.13
VAR
Base Model=Llama2-7B
2025.02
38.57
Base
Base Model=Llama2-7B
2025.02
37.46
ALoL
Base Model=Llama2-7B
2025.02
35.78
DPO
Base Model=Llama2-7B
2025.02
32.45
Feedback
Search any
task
Search any
task