Share your thoughts, 1 month free Claude Pro on usSee more

Science Reasoning on GPQA Diamond (Avg@4)

44.8Avg@4 Accuracy

GRPO + RePro

Updated 4mo ago

Evaluation Results

Method	Links
GRPO + RePro 2025.12		44.8
GRPO 2025.12		43.6
RF++ B + RePro 2025.12		43.1
PPO + RePro 2025.12		42.7
RF++ B 2025.12		42.4
PPO 2025.12		42.1
PPO + RePro 2025.12		40.3
PPO 2025.12		40.2
RF++ B + RePro 2025.12		39.8
Original 2025.12		39.5
GRPO + RePro 2025.12		39.1
RF++ B 2025.12		38.5
GRPO 2025.12		38.3
Original 2025.12		38
PPO + RePro 2025.12		24.1
PPO 2025.12		22.6
Original 2025.12		19.2