Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Comparison on SummEval (anchor set)
Loading...
94.5
Accuracy
GPT-4o
85.66
87.955
90.25
92.545
Feb 17, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4o
2025.02
94.5
GPT-4o mini
2025.02
93.4
CompassJudger-32B
Parameters=32B
2025.02
92
GPT-4 Turbo
2025.02
91.1
Phi-4-14B
Parameters=14B
2025.02
87.4
Qwen-2.5-72B
Parameters=72B
2025.02
86
Feedback
Search any
task
Search any
task