Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long Text Generation on Hotel Experience
Loading...
0.277
ROUGE-1
PAT
0.17404
0.20077
0.2275
0.25423
Apr 27, 2026
ROUGE-1
ROUGE-L
METEOR
LLM-as-a-judge Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-1
ROUGE-L
METEOR
LLM-as-a-judge Score
PAT
LLM=LlaMA3
2026.04
0.277
0.167
0.204
3.231
PGraph
LLM=LlaMA3
2026.04
0.26
0.155
0.182
1.838
PGraph
LLM=Qwen3
2026.04
0.254
0.148
0.193
3.174
PAT
LLM=Qwen3
2026.04
0.24
0.142
0.194
3.438
GraSPeR
LLM=Qwen3
2026.04
0.234
0.156
0.163
2.75
LaMP
LLM=LlaMA3
2026.04
0.224
0.141
0.148
2.651
GraSPeR
LLM=LlaMA3
2026.04
0.211
0.148
0.152
2.52
LaMP
LLM=Qwen3
2026.04
0.178
0.117
0.114
2.733
Feedback
Search any
task
Search any
task