Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge on JudgeBench (Merged GPT Claude)

87.38Direct Baseline Score

qwen3.5-35b

69.12873.866578.60583.3435Apr 4, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
87.3883.6585.4787.1987.19
2026.04
85.7383.686.2486.9586.69
2026.04
82.781.4483.5584.0784.03
2026.04
81.9182.483.8784.6684.21
2026.04
75.8180.4980.9782.182.1
2026.04
7578.0678.2378.8778.71
2026.04
71.7775.4876.4578.0678.55
2026.04
69.8374.5275.3576.9577.6