Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Query-Item Relevance on RecRM-Bench
Loading...
89.36
Accuracy
Ours
72.5328
76.9014
81.27
85.6386
May 12, 2026
Accuracy
F1 Score
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
Ours
2026.05
89.36
89.12
GPT-4.1
2026.05
79.26
79.48
Qwen3-Max
Thinking=false
2026.05
76.64
77.02
LongCat-Flash-Thinking
Thinking=true
2026.05
75.97
76.17
Qwen3-Max
Thinking=true
2026.05
75.89
76.16
Deepseek-V3.2
Thinking=true
2026.05
75.22
75.57
Deepseek-V3.2
Thinking=false
2026.05
74.6
74.76
LongCat-Flash-Chat
Thinking=false
2026.05
73.18
72.82
Feedback
Search any
task
Search any
task