Share your thoughts, 1 month free Claude Pro on usSee more

Pair-wise comparison on JudgeBench

75.7Accuracy

CCE@16

Updated 4mo ago

Evaluation Results

Method	Links
CCE@16 2025.02		75.7
CCE@16 2025.02		70.6
CCE@16 2025.02		70.4
CCE@16 2025.02		69.7
CCE-random@16 2025.02		68.9
Vanilla 2025.02		68.9
Maj@16 2025.02		68.6
Vanilla 2025.02		68.3
Agg@16 2025.02		67.2
Vanilla 2025.02		67.1
16-Criteria 2025.02		66.6
Vanilla 2025.02		66.3
CCE@16 2025.02		64
LongPrompt 2025.02		63.5
EvalPlan 2025.02		62.9
Vanilla 2025.02		58.3