Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Knowledge Evaluation on MMLU Perturbed

53.5Accuracy

NPO+KL w/ RNA

25.10832.47939.8547.221Jan 31, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.01
53.5
2025.01
47.3
2025.01
47.3
2025.01
42.2
2025.01
34.4
2025.01
31.4
2025.01
27.2
2025.01
26.2