Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Reasoning & QA on All Evaluated Datasets
Loading...
39.7
Average Accuracy
DVPO
34.4376
35.8038
37.17
38.5362
Dec 3, 2025
Average Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy
DVPO
Training Domain=Math D...
2025.12
39.7
Dr.GRPO
Training Domain=Math D...
2025.12
36.75
Reinforce++
Training Domain=Math D...
2025.12
36.65
Robust Bellman
Training Domain=Math D...
2025.12
35.98
GRPO
Training Domain=Math D...
2025.12
35.28
PPO
Training Domain=Math D...
2025.12
34.72
Base
Training Domain=Math D...
2025.12
34.64
Feedback
Search any
task
Search any
task