Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise LLM Evaluation on Restaurants
Loading...
83
Win Rate (Contrast)
Q-STRUM Debate
81.96
82.23
82.5
82.77
Feb 18, 2025
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
83
52
77
78
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
82
53
73
75
Feedback
Search any
task
Search any
task