Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Personalized Text Generation on Amazon Review
Loading...
100
User-level Accuracy (GT)
PARL-0
78.576
84.138
89.7
95.262
May 29, 2026
User-level Accuracy (GT)
User-level Accuracy (Non)
User-level Accuracy (RAG)
User-level Accuracy (Non-Think)
User-level Accuracy (RAG-Think)
User-level Accuracy (SFT)
User-level Accuracy (GRPO)
User-level Accuracy (SFT+GRPO)
Max-Diff
User Coverage
Updated 2d ago
Evaluation Results
Method
Method
Links
User-level Accuracy (GT)
User-level Accuracy (Non)
User-level Accuracy (RAG)
User-level Accuracy (Non-Think)
User-level Accuracy (RAG-Think)
User-level Accuracy (SFT)
User-level Accuracy (GRPO)
User-level Accuracy (SFT+GRPO)
Max-Diff
User Coverage
PARL-0
Optimization=Scoring f...
2026.05
100
100
99.8
100
100
100
100
100
0
100
LM-8B
Model=Qwen3-8B, Varian...
2026.05
93.1
54.8
62.4
60.1
74
82.7
93
91.7
0.001
49.1
PARL-A
Optimization=GT-scaled...
2026.05
93.1
27.2
32.5
22.5
33
87.7
83.2
90.5
0.026
99.3
PARL-B
Optimization=Margin-on...
2026.05
93
56.7
61.6
57.9
66.9
88
91.8
92.7
0.003
99.5
LM-235B
Model=Qwen3-235B-A22B-...
2026.05
79.4
25.6
34.4
28
42
64.2
78.7
75.9
0.007
84.6
Feedback
Search any
task
Search any
task