Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Question Answering on ARC Challenge (Adversarial Robustness)

97.95Attack Success Rate (ASR)

CacheTrap

97.86898.421598.97599.5285Nov 27, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
97.95-80.46
2025.11
99.57--
2025.11
99.91-65.1
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100-77.39
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100-79.69
2025.11
100--
2025.11
100--
2025.11
100-71.33
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
-43.5119.9
2025.11
-48.4621.5
2025.11
-50.5120.8
2025.11
-48.821.5
2025.11
-55.1122.78
2025.11
-42.6621.2
2025.11
-46.9221
2025.11
-45.822.01