Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Pairwise Preference Evaluation on human_preference_eval
Loading...
203
Preference Count
GPT-4o Mini
101.08
127.54
154
180.46
Feb 6, 2026
Preference Count
Preference Percentage
Updated 1mo ago
Evaluation Results
Method
Method
Links
Preference Count
Preference Percentage
GPT-4o Mini
Version=FT (Fine-tuned)
2026.02
203
65.9
GPT-4o Mini
Version=Vanilla
2026.02
105
34.1
Feedback
Search any
task
Search any
task