Text summarization via global structure awareness

About

Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model improvements and sentence-level pruning, but often overlooks global structure, leading to disrupted coherence and weakened downstream performance. Some studies employ large language models (LLMs), which achieve higher accuracy but incur substantial resource and time costs. To address these issues, we introduce GloSA-sum, the first summarization approach that achieves global structure awareness via topological data analysis (TDA). GloSA-sum summarizes text efficiently while preserving semantic cores and logical dependencies. Specifically, we construct a semantic-weighted graph from sentence embeddings, where persistent homology identifies core semantics and logical structures, preserved in a ``protection pool'' as the backbone for summarization. We design a topology-guided iterative strategy, where lightweight proxy metrics approximate sentence importance to avoid repeated high-cost computations, thus preserving structural integrity while improving efficiency. To further enhance long-text processing, we propose a hierarchical strategy that integrates segment-level and global summarization. Experiments on multiple datasets demonstrate that GloSA-sum reduces redundancy while preserving semantic and logical integrity, striking a balance between accuracy and efficiency, and further benefits LLM downstream tasks by shortening contexts while retaining essential reasoning chains.

Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Yibei Liu, Chenghao Li, Qigan Sun, Shuai Yuan, Fachrina Dewi Puspitasari, Dongshen Han, Guoqing Wang, Sung-Ho Bae, Yang Yang• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	SQuAD v2.0 (dev)	F191.2	163
Summarization	arXiv	ROUGE-220	76
Summarization	Pubmed	ROUGE-149.5	70
Summarization	CNN Daily Mail	ROUGE-144.05	67
Question Answering	SQuAD 2.0 (test)	EM88.5	34
Document Summarization	GovReport	ROUGE-155.5	15
Summarization	GovReport (test)	ROUGE-10.555	13
Summarization	arXiv	BERTScore83	12
Summarization	Pubmed	BERTScore0.86	10
Summarization	Human Evaluation 1-5 scale	Coherence4.4	10

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord