Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended query evaluation on Arena-Hard-Auto v0.1
Loading...
31.5
Win Rate
S-SPPO
11.844
16.947
22.05
27.153
Jun 1, 2026
Win Rate
Updated 1d ago
Evaluation Results
Method
Method
Links
Win Rate
S-SPPO
Backbone=Llama-3-8B, I...
2026.06
31.5
SPPO
Backbone=Llama-3-8B, I...
2026.06
31
S-SPPO
Backbone=Llama-3-8B, I...
2026.06
30.6
SPPO
Backbone=Llama-3-8B, I...
2026.06
30.1
S-SPPO
Backbone=Llama-3-8B, I...
2026.06
30
SPPO
Backbone=Llama-3-8B, I...
2026.06
29.8
S-SPPO
Backbone=Mistral-7B, I...
2026.06
23.9
SPPO
Backbone=Mistral-7B, I...
2026.06
23.3
S-SPPO
Backbone=Mistral-7B, I...
2026.06
21.8
S-SPPO
Backbone=Mistral-7B, I...
2026.06
21.5
Snorkel
Backbone=Mistral-7B, T...
2026.06
20.7
SPPO
Backbone=Mistral-7B, I...
2026.06
20.4
SPPO
Backbone=Mistral-7B, I...
2026.06
18.7
Mistral-7B-Instruct
Backbone=Mistral-7B
2026.06
12.6
Feedback
Search any
task
Search any
task