FBS: Modeling Native Parallel Reading inside a Transformer

About

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train-test consistency for preview/skimming. We propose the Fovea-Block-Skip Transformer (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.

Tongxi Wang• 2026

Related benchmarks

Task	Dataset	Result
Python Programming	MBPP	MBPP Score46.3	21
Code Generation	HumanEval-X	--	20
Massive Multitask Language Understanding	MMLU	MMLU56.6	16
Chinese Language Understanding	CMMLU (test)	CMMLU Score0.574	13
Language Modeling	Language Modeling (test)	PPL6.2	7
Chinese Massive Multitask Language Understanding	CMMLU	CMMLU Score57.4	2
Chinese Mathematical Reasoning	CMath	CMath Score40.5	1
Comprehensive Chinese Transformer Evaluation	C-Eval	C-Eval Score55.5	1
Mathematical Reasoning	GSM8K	GSM8K Score39.4	1
Reasoning	BBH	BBH Score41.5	1

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord