Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robustness Evaluation on MMLU-Pro
Loading...
0.7813
VAcc
DeepSeek-R1-Distill-LLaMA-8B
0.186836
0.341168
0.4955
0.649832
Jun 5, 2025
VAcc
RAcc
Delta Accuracy (∆Acc)
Total Failure Rate (TFR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
VAcc
RAcc
Delta Accuracy (∆Acc)
Total Failure Rate (TFR)
DeepSeek-R1-Distill-LLaMA-8B
Framework=MASTER
2025.06
0.7813
0.349
0.4323
0.5533
LLaMA-3-8B-Instruct
Framework=MASTER
2025.06
0.6129
0.1774
0.4355
0.7105
Gemma-2-2B-IT
Framework=MASTER
2025.06
0.3548
0.0645
0.2903
0.8182
LLaMA-3.2-1B-Instruct
Framework=MASTER
2025.06
0.2097
0.0161
0.1935
0.9231
Feedback
Search any
task
Search any
task