Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robustness Evaluation on MMLU
Loading...
88
VAcc
DeepSeek-R1-Distill-LLaMA-8B
40.16
52.58
65
77.42
Jun 5, 2025
VAcc
RAcc
∆Acc
TFR
Updated 1mo ago
Evaluation Results
Method
Method
Links
VAcc
RAcc
∆Acc
TFR
DeepSeek-R1-Distill-LLaMA-8B
Framework=MASTER
2025.06
88
58
30
34.09
LLaMA-3-8B-Instruct
Framework=MASTER
2025.06
69
47
22
31.88
Gemma-2-2B-IT
Framework=MASTER
2025.06
48
21
27
56.25
LLaMA-3.2-1B-Instruct
Framework=MASTER
2025.06
42
5
37
88.1
Feedback
Search any
task
Search any
task