Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-a-Judge on JudgeBench (Merged GPT Claude)

87.38Direct Baseline Score

qwen3.5-35b

Updated 3mo ago

Evaluation Results

Method	Links
qwen3.5-35b 2026.04		87.38	83.65	85.47	87.19	87.19
qwen3.5-35b 2026.04		85.73	83.6	86.24	86.95	86.69
qwen3.5-9b 2026.04		82.7	81.44	83.55	84.07	84.03
qwen3.5-9b 2026.04		81.91	82.4	83.87	84.66	84.21
gpt-oss-120b 2026.04		75.81	80.49	80.97	82.1	82.1
gpt-oss-120b 2026.04		75	78.06	78.23	78.87	78.71
gpt-oss-20b 2026.04		71.77	75.48	76.45	78.06	78.55
gpt-oss-20b 2026.04		69.83	74.52	75.35	76.95	77.6