Share your thoughts, 1 month free Claude Pro on usSee more

Generative Language Modeling and Problem Solving on IFEval, AIME25, GSM8K, GPQA, HumanEval, LCB Suite

90.4IFEval Score

Original

Updated 12d ago

Evaluation Results

Method	Links
Original 2026.04		90.4	56.7	89.3	47	93.3	48.6	70.9
REAM 2026.04		89.9	60	86.3	38.4	93.3	51	69.8
REAP 2026.04		89.6	50	87.9	39.4	94.5	50.3	68.6
HC-SMoE 2026.04		88.2	60	84.7	34.3	91.5	45.9	67.4
Freq 2026.04		87.8	60	82.9	36.9	93.9	44	67.6