Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Preference Evaluation for Code-switched Text Generation on In-Domain Data
Loading...
804
Preference Score
Gold Standard
183.64
344.695
505.75
666.805
Feb 18, 2025
Preference Score
Preference Rank
Updated 1mo ago
Evaluation Results
Method
Method
Links
Preference Score
Preference Rank
Gold Standard
2025.02
804
1
Llama3
training=fine-tuned
2025.02
573.5
2
NLLB
training=fine-tuned, b...
2025.02
507
3
Llama3 Instruct
training=fine-tuned
2025.02
480.5
4
GPT-4ofs
mode=5-shot
2025.02
413.5
5
Llama3.3-70Bfs
mode=5-shot
2025.02
371.5
6
Gold Standard
2025.02
369.5
1
Llama3
training=fine-tuned
2025.02
291.5
2
Llama3 Instruct
training=fine-tuned
2025.02
270.5
3
NLLB
training=fine-tuned, b...
2025.02
259.5
4
GPT-4ofs
mode=5-shot
2025.02
251.5
5
Llama3.3-70Bfs
mode=5-shot
2025.02
207.5
6
Feedback
Search any
task
Search any
task