Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge Evaluation on Vicuna Benchmark

65.1Pearson Correlation (r)

Qwen3-32B REAL (ours)

25.99636.14846.356.452Mar 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
65.160.751.2
2026.03
63.360.246.3
2026.03
60.557.848.2
2026.03
59.957.244
2026.03
585643.4
2026.03
57.156.143
2026.03
56.254.842.6
2026.03
52.851.340.9
2026.03
51.95239.9
2026.03
5151.340.3
2026.03
50.857.445
2026.03
50.550.438.3
2026.03
50.345.434
2026.03
49.242.436.1
2026.03
48.84841.1
2026.03
47.345.134.1
2026.03
37.346.540.8
2026.03
37.346.138.6
2026.03
29.329.521.6
2026.03
27.526.720.2