Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long Text Generation on Stylized Feedback Generation Benchmark
Loading...
24.2
R-1
PAT
17.648
19.349
21.05
22.751
Apr 27, 2026
R-1
R-L
MET Score
LLM Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
R-1
R-L
MET Score
LLM Score
PAT
LLM=Qwen3
2026.04
24.2
17.9
21
3.357
PAT
LLM=LlaMA3
2026.04
23.1
17.5
17.9
3.171
PGraph
LLM=Qwen3
2026.04
21.3
15.3
19.1
3.685
PGraph
LLM=LlaMA3
2026.04
21.1
15.2
17.5
3.29
GraSPeR
LLM=Qwen3
2026.04
19.6
14.9
16
3.04
GraSPeR
LLM=LlaMA3
2026.04
19.1
15.4
13.4
2.93
LaMP
LLM=LlaMA3
2026.04
18.1
12.2
16.4
2.873
LaMP
LLM=Qwen3
2026.04
17.9
12.4
16.1
3.107
Feedback
Search any
task
Search any
task