Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General-purpose Behavior on MultiChallenge
Loading...
58.6
Score
Qwen3-14B-as-GenRM
42.272
46.511
50.75
54.989
Feb 6, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Qwen3-14B-as-GenRM
Reward Model=Qwen3-14B...
2026.02
58.6
GenRM-R-Align-14B
Reward Model=GenRM-R-A...
2026.02
55.7
GenRM-RLVR-8B
Reward Model=GenRM-RLV...
2026.02
55
Qwen3-8B-as-GenRM
Reward Model=Qwen3-8B-...
2026.02
52.8
GenRM-R-Align-8B
Reward Model=GenRM-R-A...
2026.02
51.7
GenRM-RLVR-14B
Reward Model=GenRM-RLV...
2026.02
51.7
Qwen3-8B
Reward Model=Baseline
2026.02
42.9
Feedback
Search any
task
Search any
task