Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robustness Evaluation on CommonsenseQA

79.07VAcc

DeepSeek-R1-Distill-LLaMA-8B

50.83458.164565.49572.8255Jun 5, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.06
79.0726.6252.4566.33
2025.06
75.8451.3224.5232.33
2025.06
58.3134.623.7140.67
2025.06
51.9216.9634.9667.33