Share your thoughts, 1 month free Claude Pro on usSee more

Scientific Reasoning & QA on HLE

3.61Accuracy

Reinforce++

Updated 4mo ago

Evaluation Results

Method	Links
Reinforce++ 2025.12		3.61
DVPO 2025.12		3.57
Robust Bellman 2025.12		3.43
PPO 2025.12		3.34
Dr.GRPO 2025.12		3.2
GRPO 2025.12		3.01
Base 2025.12		2.89