Share your thoughts, 1 month free Claude Pro on usSee more

General Language Capabilities on MMLU, GSM8K, GPQA, HumanEval, TruthfulQA, IFEval Aggregate

71.2Average Score

GRPO

Updated 4mo ago

Evaluation Results

Method	Links
GRPO 2025.05		71.2
TI-DPO 2025.05		71.1
TPO 2025.05		70
CPO 2025.05		68.9
KTO 2025.05		68
DPO 2025.05		66.8
TDPO 2025.05		65.8
SFT 2025.05		65.2
IPO 2025.05		62.5
SIMPO 2025.05		62.4