Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Modeling on Math Reasoning
Loading...
87.6
Accuracy
BTPO
48.496
58.648
68.8
78.952
Oct 17, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
BTPO
Base Model=Qwen2.5-7B-...
2025.10
87.6
BTPO
Base Model=Qwen2.5-3B-...
2025.10
85.4
BT
Base Model=Qwen2.5-7B-...
2025.10
84.5
BTPO
Base Model=Llama3.1-8B...
2025.10
84.2
BTPO
Base Model=Llama3.2-3B...
2025.10
81.6
GRAM
Base Model=Qwen2.5-7B-...
2025.10
81
BT
Base Model=Llama3.1-8B...
2025.10
80.4
BT
Base Model=Llama3.2-3B...
2025.10
77.2
BT
Base Model=Qwen2.5-3B-...
2025.10
76.3
GRAM
Base Model=Qwen2.5-3B-...
2025.10
74.9
GRAM
Base Model=Llama3.2-3B...
2025.10
72.8
GRAM
Base Model=Llama3.1-8B...
2025.10
71.9
GRPO (point)
Base Model=Llama3.1-8B...
2025.10
58.4
GRPO (pair)
Base Model=Qwen2.5-7B-...
2025.10
53.7
GRPO (point)
Base Model=Qwen2.5-7B-...
2025.10
52.4
GRPO (pair)
Base Model=Llama3.1-8B...
2025.10
50.2
GRPO (pair)
Base Model=Qwen2.5-3B-...
2025.10
50
GRPO (point)
Base Model=Qwen2.5-3B-...
2025.10
50
GRPO (pair)
Base Model=Llama3.2-3B...
2025.10
50
GRPO (point)
Base Model=Llama3.2-3B...
2025.10
50
Feedback
Search any
task
Search any
task