Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text Quality Meta-evaluation on SummEval & Topical-Chat Combined
Loading...
69.5
Overall Score
DeepSeek-V3
40.588
48.094
55.6
63.106
Feb 17, 2025
Overall Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall Score
DeepSeek-V3
2025.02
69.5
GPT-4o
2025.02
69.4
GPT-4 Turbo
2025.02
68.9
GPT-4o mini
2025.02
68.4
Qwen-2.5-72B
2025.02
67.4
Gemma-2-27B
2025.02
66.9
CompassJudger-32B
2025.02
66.7
Phi-4-14B
2025.02
65.8
Llama-3.1-70B
2025.02
65
GPT-3.5 Turbo
2025.02
62.5
Prometheus-2-8x7B
2025.02
59.7
CRITIQUELLM-6B
2025.02
59.6
Prometheus-2-7B
2025.02
59
Auto-J-13B
2025.02
51.4
Prometheus-13B
2025.02
48.4
Themis-8B
2025.02
41.7
Feedback
Search any
task
Search any
task