Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Capabilities on MMLU, GSM8K, GPQA, HumanEval, TruthfulQA, IFEval Aggregate

71.2Average Score

GRPO

62.04864.42466.869.176May 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
71.2
2025.05
71.1
2025.05
70
2025.05
68.9
2025.05
68
2025.05
66.8
2025.05
65.8
2025.05
65.2
2025.05
62.5
2025.05
62.4