Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General-purpose Behavior on MultiChallenge

58.6Score

Qwen3-14B-as-GenRM

42.27246.51150.7554.989Feb 6, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
58.6
2026.02
55.7
2026.02
55
2026.02
52.8
2026.02
51.7
2026.02
51.7
2026.02
42.9