Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General-purpose Behavior on MultiChallenge

58.6Score

Qwen3-14B-as-GenRM

42.27246.51150.7554.989Feb 6, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
58.6
2026.02
55.7
2026.02
55
2026.02
52.8
2026.02
51.7
2026.02
51.7
2026.02
42.9