Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Personalized Text Generation on News Headline Generation
Loading...
99.8
User-level Accuracy (GT)
PARL-0
75.568
81.859
88.15
94.441
May 29, 2026
User-level Accuracy (GT)
User-level Accuracy (Non)
User-level Accuracy (RAG)
User-level Accuracy (Non-Think)
User-level Accuracy (RAG-Think)
User-level Accuracy (SFT)
User-level Accuracy (GRPO)
User-level Accuracy (SFT+GRPO)
Max Diff
User Coverage
Updated 2d ago
Evaluation Results
Method
Method
Links
User-level Accuracy (GT)
User-level Accuracy (Non)
User-level Accuracy (RAG)
User-level Accuracy (Non-Think)
User-level Accuracy (RAG-Think)
User-level Accuracy (SFT)
User-level Accuracy (GRPO)
User-level Accuracy (SFT+GRPO)
Max Diff
User Coverage
PARL-0
Optimization=Scoring f...
2026.05
99.8
99.8
99.8
99.6
99.8
99.9
99.8
99.9
0.001
100
PARL-A
Optimization=GT-scaled...
2026.05
83.2
33.1
46.8
25.3
41.5
81.4
49.6
77.5
0.018
99
LM-8B
Model=Qwen3-8B
2026.05
80.4
65.6
74
68.1
70.5
81.6
78.3
77.3
0.012
66.9
PARL-B
Optimization=Margin-on...
2026.05
80.4
41.4
56.9
40.8
56.1
75
59.1
74.2
0.054
100
LM-235B
Model=Qwen3-235B-A22B-...
2026.05
76.5
63.6
71.4
63.6
71.8
78.7
74.5
77.4
0.022
87.1
Feedback
Search any
task
Search any
task