Share your thoughts, 1 month free Claude Pro on usSee more

General Chat on WildBench

68.16LLM Judge Score

GOLF

Updated 4mo ago

Evaluation Results

Method	Links
GOLF 2026.03		68.16
Pairwise-GRPO 2026.03		67.77
Rubric-as-Reward 2026.03		67.09
Critique-GRPO 2026.03		64.84
Direct-Likert 2026.03		58.01
Qwen-3-8B 2026.03		48.05
GOLF 2026.03		34.42
Rubric-as-Reward 2026.03		26.51
Pairwise-GRPO 2026.03		25.54
Critique-GRPO 2026.03		25.09
Direct-Likert 2026.03		13.48
Llama-3.1-8B-Instruct 2026.03		-8.25