Poolingformer: Long Document Modeling with Pooling Attention

About

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen• 2021

Related benchmarks

Task	Dataset	Result
News Recommendation	MIND (test)	AUC68.54	75
Summarization	Pubmed	ROUGE-137.82	70
Sentiment Classification	Amazon (test)	Accuracy66.05	17
Text Summarization	CNN/DailyMail	ROUGE-138.58	13
Document Classification	MIND (test)	Accuracy0.8246	12
Long document summarization	arXiv	ROUGE-1 Score48.47	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord