Share your thoughts, 1 month free Claude Pro on usSee more

Preference Modeling on Math Reasoning

87.6Accuracy

BTPO

Updated 4mo ago

Evaluation Results

Method	Links
BTPO 2025.10		87.6
BTPO 2025.10		85.4
BT 2025.10		84.5
BTPO 2025.10		84.2
BTPO 2025.10		81.6
GRAM 2025.10		81
BT 2025.10		80.4
BT 2025.10		77.2
BT 2025.10		76.3
GRAM 2025.10		74.9
GRAM 2025.10		72.8
GRAM 2025.10		71.9
GRPO (point) 2025.10		58.4
GRPO (pair) 2025.10		53.7
GRPO (point) 2025.10		52.4
GRPO (pair) 2025.10		50.2
GRPO (pair) 2025.10		50
GRPO (point) 2025.10		50
GRPO (pair) 2025.10		50
GRPO (point) 2025.10		50