Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on ARC Challenge (Adversarial Robustness)

97.95Attack Success Rate (ASR)

CacheTrap

97.86898.421598.97599.5285Nov 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.11
97.95-80.46
2025.11
99.57--
2025.11
99.91-65.1
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100-77.39
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
100-79.69
2025.11
100--
2025.11
100--
2025.11
100-71.33
2025.11
100--
2025.11
100--
2025.11
100--
2025.11
-43.5119.9
2025.11
-48.4621.5
2025.11
-50.5120.8
2025.11
-48.821.5
2025.11
-55.1122.78
2025.11
-42.6621.2
2025.11
-46.9221
2025.11
-45.822.01