Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Screening Is Enough

About

A core limitation of standard softmax attention is that it does not provide an independently interpretable measure of query--key relevance: attention scores are unbounded, while attention weights are defined only relative to competing keys. Consequently, irrelevant keys cannot be explicitly rejected, and some attention mass is assigned even when no key is genuinely relevant. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance. Instead of redistributing attention across all keys, screening computes bounded query--key similarities and applies an explicit threshold, discarding irrelevant keys and aggregating the remaining keys without global competition. Across experiments, Multiscreen achieves comparable validation loss with roughly 30\% fewer parameters than a Transformer baseline and remains stable at substantially larger learning rates. It maintains stable long-context perplexity beyond the training context and shows little degradation in retrieval performance as context length increases. Finally, Multiscreen achieves lower full-context forward-pass latency at long context lengths.

Ken M. Nakanishi• 2026

Related benchmarks

TaskDatasetResultRank
Copying TaskCopy d=500
Accuracy100
6
Copying TaskCopy d=2000
Accuracy100
6
Memory retention taskphase-memory
Accuracy100
6
Needle-In-A-Haystack RetrievalNIAH L=2048
Accuracy100
6
Pitch Estimationmulti-pitch
Accuracy88
6
Radio Modulation ClassificationRadioML L1
Accuracy36
6
Radio Modulation ClassificationRadioML L2
Accuracy39
6
Image ClassificationFFT-MNIST
Accuracy45
6
Logical operations parsingListOps mid L1024
Accuracy83.3
6
Long Range Arena ListOpsLRA-ListOps small
Accuracy (LRA-ListOps small)71.5
6
Showing 10 of 12 rows

Other info

Follow for update