Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pair-wise comparison on RewardBench
Loading...
93.7
Accuracy
CCE@16
77.58
81.765
85.95
90.135
Feb 18, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
CCE@16
Model=Qwen 2.5 72B-Ins...
2025.02
93.7
CCE@16
Model=GPT-4o
2025.02
91.8
CCE@16
Model=Llama 3.3 70B-In...
2025.02
91.7
CCE-random@16
Model=GPT-4o
2025.02
91.2
CCE@16
Model=Qwen 2.5 32B-Ins...
2025.02
90.8
EvalPlan
Model=GPT-4o
2025.02
88.7
Agg@16
Model=GPT-4o
2025.02
88.1
Maj@16
Model=GPT-4o
2025.02
87.9
Vanilla
Model=Qwen 2.5 32B-Ins...
2025.02
87.4
16-Criteria
Model=GPT-4o
2025.02
87.3
LongPrompt
Model=GPT-4o
2025.02
86.9
Vanilla
Model=Llama 3.3 70B-In...
2025.02
86.4
Vanilla
Model=GPT-4o
2025.02
85.2
Vanilla
Model=Qwen 2.5 72B-Ins...
2025.02
85.2
CCE@16
Model=Qwen 2.5 7B-Inst...
2025.02
80.4
Vanilla
Model=Qwen 2.5 7B-Inst...
2025.02
78.2
Feedback
Search any
task
Search any
task