Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pair-wise comparison on JudgeBench
Loading...
75.7
Accuracy
CCE@16
57.604
62.302
67
71.698
Feb 18, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
CCE@16
Model=Qwen 2.5 72B-Ins...
2025.02
75.7
CCE@16
Model=Qwen 2.5 32B-Ins...
2025.02
70.6
CCE@16
Model=GPT-4o
2025.02
70.4
CCE@16
Model=Llama 3.3 70B-In...
2025.02
69.7
CCE-random@16
Model=GPT-4o
2025.02
68.9
Vanilla
Model=Qwen 2.5 32B-Ins...
2025.02
68.9
Maj@16
Model=GPT-4o
2025.02
68.6
Vanilla
Model=Qwen 2.5 72B-Ins...
2025.02
68.3
Agg@16
Model=GPT-4o
2025.02
67.2
Vanilla
Model=Llama 3.3 70B-In...
2025.02
67.1
16-Criteria
Model=GPT-4o
2025.02
66.6
Vanilla
Model=GPT-4o
2025.02
66.3
CCE@16
Model=Qwen 2.5 7B-Inst...
2025.02
64
LongPrompt
Model=GPT-4o
2025.02
63.5
EvalPlan
Model=GPT-4o
2025.02
62.9
Vanilla
Model=Qwen 2.5 7B-Inst...
2025.02
58.3
Feedback
Search any
task
Search any
task