Share your thoughts, 1 month free Claude Pro on usSee more

General Reasoning & QA on All Evaluated Datasets

39.7Average Accuracy

DVPO

Updated 5mo ago

Evaluation Results

Method	Links
DVPO 2025.12		39.7
Dr.GRPO 2025.12		36.75
Reinforce++ 2025.12		36.65
Robust Bellman 2025.12		35.98
GRPO 2025.12		35.28
PPO 2025.12		34.72
Base 2025.12		34.64