Understanding LLM Reasoning for Abstractive Summarization

About

While the reasoning capabilities of Large Language Models (LLMs) excel in analytical tasks such as mathematics and code generation, their utility for abstractive summarization remains widely assumed but largely unverified. To bridge this gap, we first tailor general reasoning strategies to the summarization domain. We then conduct a systematic, large scale comparative study of 8 reasoning strategies and 3 Large Reasoning Models (LRMs) across 8 diverse datasets, assessing both summary quality and faithfulness. Our findings show that reasoning is not a universal solution and its effectiveness is highly dependent on the specific strategy and context. Specifically, we observe a trade-off between summary quality and factual faithfulness: explicit reasoning strategies tend to improve fluency at the expense of factual grounding, while implicit reasoning in LRMs exhibits the inverse pattern. Furthermore, increasing an LRM's internal reasoning budget does not improve, and can even hurt, factual consistency, suggesting that effective summarization demands faithful compression rather than creative over-thinking.

Haohan Yuan, Haopeng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Multi-document summarization	Multi-News (test)	--	45
Summarization	SamSum	BERTScore F190.57	30
Summarization	MultiNews (test)	Comprehensiveness4.98	24
Summarization	BookSum (test)	Comp Score5	24
Summarization	SciGen (test)	Completeness Score4.99	24
Summarization	Aggregate (test)	Comprehensiveness4.97	24
Summarization	arXiv (test)	Completeness Score5	24
Summarization	arXiv (test)	BS Score85.03	21
Abstractive Summarization	Multi-News 56k samples (test)	ROUGE Score20.72	12
Abstractive Summarization	CNN/DM sampled (test)	ROUGE Score22.86	12

Showing 10 of 42 rows

Other info

Follow for update

@wizwand_team Discord