Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on Winogrande (Adversarial Robustness)

73.2Accuracy (Pre-Attack)

Llama3-8B

48.375254.820161.26567.7099Nov 27, 2025Nov 29, 2025Dec 2, 2025Dec 5, 2025Dec 8, 2025Dec 11, 2025Dec 14, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.11
73.248.2
2025.11
73.1649.8
2025.11
7348.93
2025.11
72.350.4
2025.11
71.155.3
2025.11
68.5848.1
2025.11
68.347.8
2025.11
68.253.9
2025.12
66.6110.67
2025.12
58.0146
2025.12
56.9944.67
2025.12
56.590
2025.12
56.595.33
2025.12
54.5446
2025.12
53.576.67
2025.12
53.5127.33
2025.12
52.7246
2025.12
51.467.33
2025.12
49.334.67