Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Pairwise LLM Evaluation on Restaurants
Loading...
83
Win Rate (Contrast)
Q-STRUM Debate
81.96
82.23
82.5
82.77
Feb 18, 2025
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
83
52
77
78
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
82
53
73
75
Feedback
Search any
task
Search any
task