Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-a-Judge Evaluation on Average Across FB Bench, FLASK, Vic. Bench, MT Bench

71Pearson (r)

Qwen3-32B REAL (ours)

36.99245.82154.6563.479Mar 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
717060
2026.03
67.966.253
2026.03
676656.1
2026.03
64.162.750.3
2026.03
63.262.249.8
2026.03
63.26451.4
2026.03
63.163.851.2
2026.03
62.662.849.2
2026.03
61.961.148.3
2026.03
60.759.847.8
2026.03
59.157.649.3
2026.03
56.754.845.1
2026.03
55.655.844.8
2026.03
52.758.846.1
2026.03
52.150.839.5
2026.03
51.249.842.6
2026.03
49.149.340.8
2026.03
43.748.839.5
2026.03
43.647.440.3
2026.03
38.334.828.5