Share your thoughts, 1 month free Claude Pro on usSee more

Pairwise Comparison on SummEval (anchor set)

94.5Accuracy

GPT-4o

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4o 2025.02		94.5
GPT-4o mini 2025.02		93.4
CompassJudger-32B 2025.02		92
GPT-4 Turbo 2025.02		91.1
Phi-4-14B 2025.02		87.4
Qwen-2.5-72B 2025.02		86