Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversational Recommendation on MUSE (n=200)
Loading...
4.16
Recommendation Quality (Rec.Q)
HARPO
3.4944
3.6672
3.84
4.0128
Apr 11, 2026
Recommendation Quality (Rec.Q)
Explainability Quality (Exp.Q)
Overall Quality
Kappa (Inter-rater Agreement)
Updated 5d ago
Evaluation Results
Method
Method
Links
Recommendation Quality (Rec.Q)
Explainability Quality (Exp.Q)
Overall Quality
Kappa (Inter-rater Agreement)
HARPO
2026.04
4.16
4
4.09
0.79
Qwen2-VL-7B
2026.04
3.68
3.44
3.58
0.74
GPT-4V
2026.04
3.52
3.38
3.46
0.73
Feedback
Search any
task
Search any
task