Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pair-wise comparison on EvalBias
Loading...
85.9
Accuracy
CCE@16
56.26
63.955
71.65
79.345
Feb 18, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
CCE@16
Model=Qwen 2.5 72B-Ins...
2025.02
85.9
CCE@16
Model=GPT-4o
2025.02
85
CCE@16
Model=Qwen 2.5 32B-Ins...
2025.02
80.5
CCE-random@16
Model=GPT-4o
2025.02
80.1
CCE@16
Model=Qwen 2.5 7B-Inst...
2025.02
79.4
CCE@16
Model=Llama 3.3 70B-In...
2025.02
79.2
Agg@16
Model=GPT-4o
2025.02
77.9
Maj@16
Model=GPT-4o
2025.02
75.5
EvalPlan
Model=GPT-4o
2025.02
74.4
16-Criteria
Model=GPT-4o
2025.02
73.7
Vanilla
Model=Qwen 2.5 32B-Ins...
2025.02
71.1
Vanilla
Model=Llama 3.3 70B-In...
2025.02
70.6
LongPrompt
Model=GPT-4o
2025.02
70.5
Vanilla
Model=GPT-4o
2025.02
68.5
Vanilla
Model=Qwen 2.5 72B-Ins...
2025.02
68.5
Vanilla
Model=Qwen 2.5 7B-Inst...
2025.02
57.4
Feedback
Search any
task
Search any
task