Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Language Understanding on MMLU (Accuracy and AVERAGE MEAN)
Loading...
50.12
Accuracy
Full-data Fine-tuning
45.076
46.3855
47.695
49.0045
Oct 8, 2025
Accuracy
Average Mean
Updated 19d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Mean
Full-data Fine-tuning
Base Model=LLAMA-2-7B,...
2025.10
50.12
48.99
TRIM
Base Model=LLAMA-2-7B,...
2025.10
49.33
48.56
LESS
Base Model=LLAMA-2-7B,...
2025.10
49.23
48.27
TAGCOS
Base Model=LLAMA-2-7B,...
2025.10
48.12
46.6
S2L
Base Model=LLAMA-2-7B,...
2025.10
46.7
45.56
CLD
Base Model=LLAMA-2-7B,...
2025.10
46.13
42.61
BM25
Base Model=LLAMA-2-7B,...
2025.10
46.12
45.41
Random
Base Model=LLAMA-2-7B,...
2025.10
45.84
45.14
DSIR
Base Model=LLAMA-2-7B,...
2025.10
45.73
42.02
Pretrained (no Fine-tuning)
Base Model=LLAMA-2-7B,...
2025.10
45.6
43.43
RDS
Base Model=LLAMA-2-7B,...
2025.10
45.27
42.17
Feedback
Search any
task
Search any
task