Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Performance Evaluation on Aggregate (IFEval, GPQA, LCB, Arena-Hard, CW, MT-Bench, WildBench)
Loading...
63.69
Average Score
SPARD
39.5724
45.8337
52.095
58.3563
Apr 9, 2026
Average Score
Updated 9d ago
Evaluation Results
Method
Method
Links
Average Score
SPARD
Backbone=Qwen3-8B
2026.04
63.69
+ GRPOavg
Backbone=Qwen3-8B
2026.04
62.17
+ GRPOimp
Backbone=Qwen3-8B
2026.04
61.74
+ DPO
Backbone=Qwen3-8B
2026.04
61.2
+ GRPOrm
Backbone=Qwen3-8B
2026.04
61.15
Base
Backbone=Qwen3-8B
2026.04
60.56
SPARD
Backbone=Qwen2.5-7B-In...
2026.04
50.03
+ SFT
Backbone=Qwen3-8B
2026.04
49.6
+ GRPOavg
Backbone=Qwen2.5-7B-In...
2026.04
48.46
+ DPO
Backbone=Qwen2.5-7B-In...
2026.04
47.55
+ GRPOrm
Backbone=Qwen2.5-7B-In...
2026.04
47.06
Base
Backbone=Qwen2.5-7B-In...
2026.04
46.12
+ GRPOimp
Backbone=Qwen2.5-7B-In...
2026.04
45.59
+ SFT
Backbone=Qwen2.5-7B-In...
2026.04
40.5
Feedback
Search any
task
Search any
task