Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific and General Reasoning on Theorem QA
Loading...
55.75
Pass@1
MulFeRL
17.53
27.4525
37.375
47.2975
Jun 3, 2025
Jul 13, 2025
Aug 22, 2025
Oct 1, 2025
Nov 10, 2025
Dec 20, 2025
Jan 30, 2026
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
MulFeRL
Backbone=Qwen3-4B-Inst...
2026.01
55.75
Critique-GRPO
Training Data Volume=4...
2025.06
51.4
Critique-GRPO
Backbone=Qwen3-4B-Inst...
2026.01
51.05
Dr.GRPO
Backbone=Qwen3-4B-Inst...
2026.01
50.68
GRPO
Backbone=Qwen3-4B-Inst...
2026.01
49.85
MulFeRL
Backbone=Qwen2.5-7B-Ba...
2026.01
43.08
CITL-FT
Backbone=Qwen3-4B-Inst...
2026.01
42.65
SFT
Backbone=Qwen3-4B-Inst...
2026.01
41.93
RAFT
Backbone=Qwen3-4B-Inst...
2026.01
41.05
Dr.GRPO
Backbone=Qwen2.5-7B-Ba...
2026.01
40.08
Qwen3-4B-Inst
Backbone=Qwen3-4B-Inst...
2026.01
39.85
Critique-GRPO
Backbone=Qwen2.5-7B-Ba...
2026.01
39.75
GRPO
Backbone=Qwen2.5-7B-Ba...
2026.01
37.55
Qwen2.5-Math-7B-Base
Training Data Volume=None
2025.06
26.4
SFT
Backbone=Qwen2.5-7B-Ba...
2026.01
24.1
CITL-FT
Backbone=Qwen2.5-7B-Ba...
2026.01
23.65
RAFT
Backbone=Qwen2.5-7B-Ba...
2026.01
21.9
Qwen2.5-7B-Base
Backbone=Qwen2.5-7B-Ba...
2026.01
19
Feedback
Search any
task
Search any
task