Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense Reasoning on Winogrande (Adversarial Robustness)

73.2Accuracy (Pre-Attack)

Llama3-8B

48.375254.820161.26567.7099Nov 27, 2025Nov 29, 2025Dec 2, 2025Dec 5, 2025Dec 8, 2025Dec 11, 2025Dec 14, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
73.248.2
2025.11
73.1649.8
2025.11
7348.93
2025.11
72.350.4
2025.11
71.155.3
2025.11
68.5848.1
2025.11
68.347.8
2025.11
68.253.9
2025.12
66.6110.67
2025.12
58.0146
2025.12
56.9944.67
2025.12
56.590
2025.12
56.595.33
2025.12
54.5446
2025.12
53.576.67
2025.12
53.5127.33
2025.12
52.7246
2025.12
51.467.33
2025.12
49.334.67