Share your thoughts, 1 month free Claude Pro on usSee more

Pair-wise comparison on EvalBias

85.9Accuracy

CCE@16

Updated 3mo ago

Evaluation Results

Method	Links
CCE@16 2025.02		85.9
CCE@16 2025.02		85
CCE@16 2025.02		80.5
CCE-random@16 2025.02		80.1
CCE@16 2025.02		79.4
CCE@16 2025.02		79.2
Agg@16 2025.02		77.9
Maj@16 2025.02		75.5
EvalPlan 2025.02		74.4
16-Criteria 2025.02		73.7
Vanilla 2025.02		71.1
Vanilla 2025.02		70.6
LongPrompt 2025.02		70.5
Vanilla 2025.02		68.5
Vanilla 2025.02		68.5
Vanilla 2025.02		57.4