Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Reasoning on BALROG
Loading...
31.5
Accuracy
TemplateRL
11.74
16.87
22
27.13
May 21, 2025
Accuracy
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
TemplateRL
Base Model=Qwen2.5-Mat...
2025.05
31.5
OpenReasoner-Zero
Base Model=Qwen2.5-Mat...
2025.05
28.3
Oat-Zero
Base Model=Qwen2.5-Mat...
2025.05
26.2
GRPO
Base Model=Qwen2.5-Mat...
2025.05
25.4
PRIME-Zero
Base Model=Qwen2.5-Mat...
2025.05
24.1
SimpleRL-Zero
Base Model=Qwen2.5-Mat...
2025.05
17.4
Qwen2.5-Math-7B-Instruct
2025.05
15.4
Qwen2.5-Math-7B-Base
2025.05
12.5
Feedback
Search any
task
Search any
task