Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise Comparison on HelpSteer2
Loading...
72.3
Accuracy
Vanilla
60.236
63.368
66.5
69.632
Feb 18, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Vanilla
Model=Qwen 2.5 32B-Ins...
2025.02
72.3
CCE@16
Model=Qwen 2.5 32B-Ins...
2025.02
72.1
CCE@16
Model=Llama 3.3 70B-In...
2025.02
71.3
CCE@16
Model=GPT-4o
2025.02
70.6
Vanilla
Model=Llama 3.3 70B-In...
2025.02
70.4
CCE-random@16
Model=GPT-4o
2025.02
69.5
Vanilla
Model=Qwen 2.5 72B-Ins...
2025.02
69.5
16-Criteria
Model=GPT-4o
2025.02
69.1
Maj@16
Model=GPT-4o
2025.02
68.9
Agg@16
Model=GPT-4o
2025.02
68.7
CCE@16
Model=Qwen 2.5 72B-Ins...
2025.02
68.5
LongPrompt
Model=GPT-4o
2025.02
67.3
Vanilla
Model=GPT-4o
2025.02
66.1
EvalPlan
Model=GPT-4o
2025.02
65.5
CCE@16
Model=Qwen 2.5 7B-Inst...
2025.02
64.2
Vanilla
Model=Qwen 2.5 7B-Inst...
2025.02
60.7
Feedback
Search any
task
Search any
task