Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Multitask Evaluation on Math500, GPQA, HumanEval, MBPP, AE2 LC Aggregate

40.7Average Score

Llama3.2-3B-GRLO+RLVR

20.73225.91631.136.284May 14, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.05
40.7
2026.05
39.3
2026.05
35.6
2026.05
30.7
2026.05
21.5