Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robustness Evaluation on MATH

0FPR (%)

AdvJudge-Zero

-4235077Dec 19, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0
2025.12
0
2025.12
0
2025.12
0.01
2025.12
0.55
2025.12
9.63
2025.12
10.77
2025.12
55.14
2025.12
60.7
2025.12
67.99
2025.12
75.18
2025.12
82.88
2025.12
89.25
2025.12
99.41
2025.12
99.8
2025.12
99.85
2025.12
99.88
2025.12
99.92
2025.12
100
2025.12
100