Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise LLM Evaluation on Hotels
Loading...
82
Win Rate (Contrast)
Q-STRUM Debate
77.84
78.92
80
81.08
Feb 18, 2025
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate (Contrast)
Win Rate (Relevance)
Win Rate (Diversity)
Win Rate (Usefulness)
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
82
55
83
80
Q-STRUM Debate
Comparison Baseline=Q-...
2025.02
78
55
76
75
Feedback
Search any
task
Search any
task