Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robustness Evaluation on MMLU-Pro

0.7813VAcc

DeepSeek-R1-Distill-LLaMA-8B

0.1868360.3411680.49550.649832Jun 5, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
0.78130.3490.43230.5533
2025.06
0.61290.17740.43550.7105
2025.06
0.35480.06450.29030.8182
2025.06
0.20970.01610.19350.9231