Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning & QA on SampleQA

3.31Accuracy

DVPO

Updated 5mo ago

Evaluation Results

Method	Links
DVPO 2025.12		3.31
Reinforce++ 2025.12		3.19
Dr.GRPO 2025.12		3.1
GRPO 2025.12		2.91
Base 2025.12		2.89
PPO 2025.12		2.7
Robust Bellman 2025.12		2.61