Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Knowledge Evaluation on MMLU Perturbed
Loading...
53.5
Accuracy
NPO+KL w/ RNA
25.108
32.479
39.85
47.221
Jan 31, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
NPO+KL w/ RNA
Model Backbone=Mistral...
2025.01
53.5
RMU w/ RNA
Model Backbone=Llama-3...
2025.01
47.3
NPO+KL w/ RNA
Model Backbone=Llama-3...
2025.01
47.3
RMU w/ RNA
Model Backbone=Mistral...
2025.01
42.2
RMU
Model Backbone=Llama-3...
2025.01
34.4
NPO+KL
Model Backbone=Mistral...
2025.01
31.4
RMU
Model Backbone=Mistral...
2025.01
27.2
NPO+KL
Model Backbone=Llama-3...
2025.01
26.2
Feedback
Search any
task
Search any
task