FBS: Modeling Native Parallel Reading inside a Transformer
About
Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train--test consistency for preview/skimming. We propose the \textbf{Fovea-Block-Skip Transformer} (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.
Tongxi Wang• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval-X | -- | 20 | |
| Chinese Language Understanding | CMMLU (test) | CMMLU Score0.574 | 13 | |
| Language Modeling | Language Modeling (test) | PPL6.2 | 7 | |
| Massive Multitask Language Understanding | MMLU | MMLU56.6 | 7 | |
| Chinese Massive Multitask Language Understanding | CMMLU | CMMLU Score57.4 | 2 | |
| Chinese Mathematical Reasoning | CMath | CMath Score40.5 | 1 | |
| Comprehensive Chinese Transformer Evaluation | C-Eval | C-Eval Score55.5 | 1 | |
| Mathematical Reasoning | GSM8K | GSM8K Score39.4 | 1 | |
| Python Programming | MBPP | MBPP Score46.3 | 1 | |
| Reasoning | BBH | BBH Score41.5 | 1 |
Showing 10 of 10 rows