Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Reasoning & QA on Science & QA Domain Multiple Datasets
Loading...
4.04
Average Accuracy
DVPO
2.9168
3.2084
3.5
3.7916
Dec 3, 2025
Average Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy
DVPO
Training Domain=Math D...
2025.12
4.04
Reinforce++
Training Domain=Math D...
2025.12
3.72
Dr.GRPO
Training Domain=Math D...
2025.12
3.55
GRPO
Training Domain=Math D...
2025.12
3.3
Robust Bellman
Training Domain=Math D...
2025.12
3.22
PPO
Training Domain=Math D...
2025.12
3.16
Base
Training Domain=Math D...
2025.12
2.96
Feedback
Search any
task
Search any
task