Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Opinion Summarization on PeerSum (test)
Loading...
99
Coverage
MOSAIC
57.4
68.2
79
89.8
Mar 1, 2026
Coverage
G-Eval
AlignScore-R
AlignScore-M
Updated 27d ago
Evaluation Results
Method
Method
Links
Coverage
G-Eval
AlignScore-R
AlignScore-M
MOSAIC
Base Model=GPT-4o
2026.03
99
84
81
16
MOSAIC
Base Model=Llama 70B
2026.03
99
82
81
19
Aspect-aware decomposition
Base Model=Llama 70B
2026.03
97
76
76
9
Sentiment CoT
Base Model=GPT-4o
2026.03
96
75
72
8
Aspect-aware decomposition
Base Model=GPT-4o
2026.03
95
76
68
6
FT-Llama 8B
Base Model=Llama 8B
2026.03
87
60
33
6
Chunk-wise decomposition
Base Model=Llama 70B
2026.03
84
72
65
6
Naive aspect-aware prompting
Base Model=Llama 70B
2026.03
72
62
70
7
Automatic decomposition
Base Model=Llama 70B
2026.03
59
31
51
3
Feedback
Search any
task
Search any
task