Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Robustness Evaluation on CommonsenseQA
Loading...
79.07
VAcc
DeepSeek-R1-Distill-LLaMA-8B
50.834
58.1645
65.495
72.8255
Jun 5, 2025
VAcc
RAcc
∆Acc
TFR
Updated 1mo ago
Evaluation Results
Method
Method
Links
VAcc
RAcc
∆Acc
TFR
DeepSeek-R1-Distill-LLaMA-8B
Framework=MASTER
2025.06
79.07
26.62
52.45
66.33
LLaMA-3-8B-Instruct
Framework=MASTER
2025.06
75.84
51.32
24.52
32.33
Gemma-2-2B-IT
Framework=MASTER
2025.06
58.31
34.6
23.71
40.67
LLaMA-3.2-1B-Instruct
Framework=MASTER
2025.06
51.92
16.96
34.96
67.33
Feedback
Search any
task
Search any
task