Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability on GPQA-diamond OpenR1-Math Harder Subset

54Accuracy

Qwen-4B

50.8851.6952.553.31Feb 11, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
54
2026.02
52.5
2026.02
51