Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Behavior Prediction on RecRM-Bench
Loading...
77.78
Accuracy
Ours
29.1496
41.7748
54.4
67.0252
May 12, 2026
Accuracy
AUC
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
AUC
Ours
2026.05
77.78
81.46
Deepseek-V3.2
Thinking=false
2026.05
47.8
52.1
LongCat-Flash-Chat
Thinking=false
2026.05
46.49
52.22
Qwen3-Max
Thinking=true
2026.05
44.47
48.5
Qwen3-Max
Thinking=false
2026.05
42.79
52.08
GPT-4.1
2026.05
39.28
49.31
Deepseek-V3.2
Thinking=true
2026.05
35.17
50.76
LongCat-Flash-Thinking
Thinking=true
2026.05
31.02
45.41
Feedback
Search any
task
Search any
task