Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Capability on GPQA-diamond OpenR1-Math Harder Subset
Loading...
54
Accuracy
Qwen-4B
50.88
51.69
52.5
53.31
Feb 11, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen-4B
Backbone=Qwen, Paramet...
2026.02
54
RePO
Backbone=Qwen, Paramet...
2026.02
52.5
LUFFY
Backbone=Qwen, Paramet...
2026.02
51
Feedback
Search any
task
Search any
task