Share your thoughts, 1 month free Claude Pro on usSee more

General Language Modeling on MMLU, ARC-Challenge, and CommonsenseQA Aggregate

64.77Average Score

RAISE

Updated 4mo ago

Evaluation Results

Method	Links
RAISE 2025.04		64.77	69.83
SSPL 2025.04		64.42	46.59
IFD 2025.04		64.22	34.04
AlpaGasus 2025.04		64.19	32.06
DEITA 2025.04		64.13	28.37
RAND 2025.04		63.94	15.53
Full Alpaca (100%) 2025.04		63.7	0
Base Model (0%) 2025.04		62.16	-100
RAISE 2025.04		55.47	70.35
IFD 2025.04		54.73	25.06
Full Alpaca (100%) 2025.04		54.32	0
RAND 2025.04		54.2	-7.33
DEITA 2025.04		54.05	-16.44
AlpaGasus 2025.04		53.13	-72.14
Base Model (0%) 2025.04		52.67	-100
SSPL 2025.04		51.08	-196.75
RAISE 2025.04		40.24	85.14
SSPL 2025.04		40.08	69.55
RAND 2025.04		39.44	7.01
Full Alpaca (100%) 2025.04		39.36	0
IFD 2025.04		39.2	-15.77
AlpaGasus 2025.04		38.88	-47.39
DEITA 2025.04		38.69	-65.79
Base Model (0%) 2025.04		38.33	-100