Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning & QA on SampleQA
Loading...
3.31
Accuracy
DVPO
2.582
2.771
2.96
3.149
Dec 3, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
DVPO
Training Domain=Math D...
2025.12
3.31
Reinforce++
Training Domain=Math D...
2025.12
3.19
Dr.GRPO
Training Domain=Math D...
2025.12
3.1
GRPO
Training Domain=Math D...
2025.12
2.91
Base
Training Domain=Math D...
2025.12
2.89
PPO
Training Domain=Math D...
2025.12
2.7
Robust Bellman
Training Domain=Math D...
2025.12
2.61
Feedback
Search any
task
Search any
task