Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robustness Evaluation: Question Answering on ARC Easy

93.01Accuracy (After Attack)

CacheTrap

-3.720421.392346.50571.6177Nov 27, 2025Nov 29, 2025Dec 2, 2025Dec 5, 2025Dec 8, 2025Dec 11, 2025Dec 14, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
93.01-100
2025.11
93.01-100
2025.11
93.01-100
2025.11
93.01-100
2025.11
93.01-100
2025.11
92.83-100
2025.11
92.83-100
2025.11
92.83-100
2025.11
92.83-100
2025.11
92.83-100
2025.11
89.69-100
2025.11
89.69-100
2025.11
89.69-100
2025.11
89.69-100
2025.11
89.69-100
2025.11
86.78-100
2025.11
86.78-100
2025.11
86.78-100
2025.11
86.78-100
2025.11
86.78-100
2025.11
83.46-100
2025.11
83.46-100
2025.11
83.46-100
2025.11
83.46-100
2025.11
83.46-100
2025.12
6068.7-
2025.11
38.282.87-
2025.12
28.6753.125-
2025.11
28.679.62-
2025.11
27.877.56-
2025.11
27.1875.79-
2025.12
26.6765.625-
2025.11
26.480.09-
2025.11
25.979.41-
2025.11
25.1678.5-
2025.11
24.875.2-
2025.12
2463.15-
2025.12
1462.5-
2025.12
12.6746.975-
2025.12
266.41-
2025.12
061.75-
2025.12
062.5-
2025.12
065.26-
2025.12
076.94-