Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-Judge Robustness to Adversarial Attacks on RobustJudge

-0.109None Condition Score

Qwen3-Next-80B-A3B-Instruct

Updated 4mo ago

Evaluation Results

Method	Links
Qwen3-Next-80B-A3B-Instruct 2026.01		-0.109	-	-0.045	-0.044	0.198	-0.023	-0.051	0.353	0.759	-0.806	0.026
Qwen3-30B-A3B-Instruct-2507 2026.01		-0.129	-	-0.076	-0.045	0.047	0.042	-0.024	0.273	0.859	-0.532	0.046
Qwen2.5-32B-Instruct 2026.01		-0.213	-	-0.65	-0.156	0.517	-0.172	-0.18	-0.146	0.406	-0.65	-0.138
DeepSeek-V3 2026.01		-0.259	-	-0.217	-0.19	0.51	-0.139	-0.197	-0.043	0.35	-0.695	-0.098
QwQ-32B 2026.01		-0.316	-	-0.652	-0.261	0.517	-0.26	-0.268	0.508	0.535	-0.652	-0.094
Qwen3-Next-80B-A3B-Thinking 2026.01		-0.383	-	-0.401	-0.312	0.461	-0.277	-0.439	0.466	-0.009	-0.815	-0.19
Qwen3-30B-A3B-Thinking-2507 2026.01		-0.412	-	-0.336	-0.321	-0.316	-0.297	-0.433	0.17	0.511	-0.702	-0.237
DeepSeek-R1 2026.01		-0.434	-	-0.379	-0.357	0.366	-0.326	-0.375	-0.265	0.882	-0.734	-0.18