Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Modeling on Instruction Following
Loading...
65.2
Accuracy
BTPO
49.392
53.496
57.6
61.704
Oct 17, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
BTPO
Base Model=Llama3.1-8B...
2025.10
65.2
BT
Base Model=Llama3.1-8B...
2025.10
63.4
BTPO
Base Model=Llama3.2-3B...
2025.10
61.4
BTPO
Base Model=Qwen2.5-7B-...
2025.10
60.1
BTPO
Base Model=Qwen2.5-3B-...
2025.10
58.7
BT
Base Model=Qwen2.5-7B-...
2025.10
58.7
BT
Base Model=Llama3.2-3B...
2025.10
58.7
BT
Base Model=Qwen2.5-3B-...
2025.10
57
GRPO (pair)
Base Model=Llama3.1-8B...
2025.10
53.4
GRAM
Base Model=Qwen2.5-3B-...
2025.10
53.3
GRPO (pair)
Base Model=Qwen2.5-7B-...
2025.10
52.2
GRPO (pair)
Base Model=Qwen2.5-3B-...
2025.10
51.1
GRAM
Base Model=Llama3.2-3B...
2025.10
50.4
GRAM
Base Model=Qwen2.5-7B-...
2025.10
50.2
GRPO (pair)
Base Model=Llama3.2-3B...
2025.10
50.1
GRPO (point)
Base Model=Qwen2.5-3B-...
2025.10
50
GRPO (point)
Base Model=Qwen2.5-7B-...
2025.10
50
GRPO (point)
Base Model=Llama3.2-3B...
2025.10
50
GRAM
Base Model=Llama3.1-8B...
2025.10
50
GRPO (point)
Base Model=Llama3.1-8B...
2025.10
50
Feedback
Search any
task
Search any
task