Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Preference Evaluation for Code-switched Text Generation on EN-CS (Out of domain)
Loading...
434.5
Score
Gold Standard
151.1
224.675
298.25
371.825
Feb 18, 2025
Score
Rank
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Rank
Gold Standard
2025.02
434.5
1
Llama3
training=fine-tuned
2025.02
282
2
NLLB
training=fine-tuned, b...
2025.02
247.5
3
Llama3 Instruct
training=fine-tuned
2025.02
210
4
Llama3.3-70Bfs
mode=5-shot
2025.02
164
5
GPT-4ofs
mode=5-shot
2025.02
162
6
Feedback
Search any
task
Search any
task