Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise preference ranking on Held-out
Loading...
1,187
ELO Score
Human Abstracts
813.64
910.57
1,007.5
1,104.43
Oct 6, 2025
ELO Score
Wins
Losses
Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
ELO Score
Wins
Losses
Win Rate
Human Abstracts
2025.10
1,187
17
1
94.4
GPT-OSS-120B + GPT-OSS-120B
Configuration=self-play
2025.10
1,119
14
5
63.6
Mistral-24B + GPT-OSS-120B
Proposer=Mistral-24B,...
2025.10
939
5
8
25
Mistral-24B + Mistral-24B
Configuration=self-play
2025.10
927
5
12
25
GPT-OSS-120B + Mistral-24B
Proposer=GPT-OSS-120B,...
2025.10
828
1
16
5
Feedback
Search any
task
Search any
task