Screening Is Enough

About

A core limitation of standard softmax attention is that it does not provide an independently interpretable measure of query--key relevance: attention scores are unbounded, while attention weights are defined only relative to competing keys. Consequently, irrelevant keys cannot be explicitly rejected, and some attention mass is assigned even when no key is genuinely relevant. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance. Instead of redistributing attention across all keys, screening computes bounded query--key similarities and applies an explicit threshold, discarding irrelevant keys and aggregating the remaining keys without global competition. Across experiments, Multiscreen achieves comparable validation loss with roughly 30\% fewer parameters than a Transformer baseline and remains stable at substantially larger learning rates. It maintains stable long-context perplexity beyond the training context and shows little degradation in retrieval performance as context length increases. Finally, Multiscreen achieves lower full-context forward-pass latency at long context lengths.

Ken M. Nakanishi• 2026

Related benchmarks

Task	Dataset	Result
Copying Task	Copy d=500	Accuracy100	6
Copying Task	Copy d=2000	Accuracy100	6
Memory retention task	phase-memory	Accuracy100	6
Needle-In-A-Haystack Retrieval	NIAH L=2048	Accuracy100	6
Pitch Estimation	multi-pitch	Accuracy88	6
Radio Modulation Classification	RadioML L1	Accuracy36	6
Radio Modulation Classification	RadioML L2	Accuracy39	6
Image Classification	FFT-MNIST	Accuracy45	6
Logical operations parsing	ListOps mid L1024	Accuracy83.3	6
Long Range Arena ListOps	LRA-ListOps small	Accuracy (LRA-ListOps small)71.5	6

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord