Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling on WEval
Loading...
95.7
Correlation
Human
74.588
80.069
85.55
91.031
Apr 30, 2026
Correlation
IL
PL
Updated 1mo ago
Evaluation Results
Method
Method
Links
Correlation
IL
PL
Human
Model Type=Human annot...
2026.04
95.7
97.9
79.4
Our-RM-7B
Model Type=Trained rew...
2026.04
94.6
97.3
78
Qwen2.5-72B-Instruct-as-a-judge
Model Type=LLM-as-a-judge
2026.04
93.6
96.8
78.7
Qwen2.5-7B-Instruct-as-a-judge
Model Type=LLM-as-a-judge
2026.04
85.4
92.9
60.9
Skywork-Reward-V2-Qwen3-8B
Model Type=Trained rew...
2026.04
82.5
91.7
45.3
Writing-Critic-7B
Model Type=Trained rew...
2026.04
80.6
90.8
42.7
Skywork-Reward-V2-Llama-3.1-8B
Model Type=Trained rew...
2026.04
75.4
88.7
32.9
Feedback
Search any
task
Search any
task