Share your thoughts, 1 month free Claude Pro on usSee more

General Multitask Evaluation on Math500, GPQA, HumanEval, MBPP, AE2 LC Aggregate

40.7Average Score

Llama3.2-3B-GRLO+RLVR

Updated 2mo ago

Evaluation Results

Method	Links
Llama3.2-3B-GRLO+RLVR 2026.05		40.7
Llama3.2-3B-GRLO 2026.05		39.3
Llama3.2-3B-Instruct 2026.05		35.6
Llama3.2-3B-RLVR 2026.05		30.7
Llama3.2-3B-SFT 2026.05		21.5