Share your thoughts, 1 month free Claude Pro on usSee more

General Language Model Capability on MMLU, GSM8K, HumanEval, and BBH Aggregate

68.42Average Score

VAR

Updated 26d ago

Evaluation Results

Method	Links
VAR 2025.02		68.42
ALoL 2025.02		62.4
Base 2025.02		61.8
DPO 2025.02		57.44
VAR 2025.02		24.2
ALoL 2025.02		22.71
DPO 2025.02		20.86
Base 2025.02		13.79