Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Preference Evaluation on Basque Arena
Loading...
1,183
Arena Content Score
GPT-4o
749.32
861.91
974.5
1,087.09
Jun 9, 2025
Arena Content Score
Arena Language Score
Arena Global Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Arena Content Score
Arena Language Score
Arena Global Score
GPT-4o
Model Scale=Proprietary
2025.06
1,183
1,093
1,188
Claude 3.5 Sonnet
Model Scale=Proprietary
2025.06
1,150
1,082
1,153
70B + CEU IEN
Model Scale=70B, Train...
2025.06
1,127
1,083
1,141
8B + CEU IEN+EU
Model Scale=8B, Traini...
2025.06
1,047
1,038
1,050
8B + CEU IEU
Model Scale=8B, Traini...
2025.06
1,045
1,034
1,050
8B + CEU IEN
Model Scale=8B, Traini...
2025.06
1,031
1,036
1,038
8B INSTRUCT EN
Model Scale=8B, Traini...
2025.06
766
783
722
Feedback
Search any
task
Search any
task