Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM-as-a-Judge Robustness to Adversarial Attacks on RobustJudge

-0.109None Condition Score

Qwen3-Next-80B-A3B-Instruct

-0.447-0.35925-0.2715-0.18375Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
-0.109--0.045-0.0440.198-0.023-0.0510.3530.759-0.8060.026
2026.01
-0.129--0.076-0.0450.0470.042-0.0240.2730.859-0.5320.046
2026.01
-0.213--0.65-0.1560.517-0.172-0.18-0.1460.406-0.65-0.138
-0.259--0.217-0.190.51-0.139-0.197-0.0430.35-0.695-0.098
2026.01
-0.316--0.652-0.2610.517-0.26-0.2680.5080.535-0.652-0.094
2026.01
-0.383--0.401-0.3120.461-0.277-0.4390.466-0.009-0.815-0.19
2026.01
-0.412--0.336-0.321-0.316-0.297-0.4330.170.511-0.702-0.237
-0.434--0.379-0.3570.366-0.326-0.375-0.2650.882-0.734-0.18