Avey-B

About

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Devang Acharya, Mohammad Hammoud• 2026

Related benchmarks

Task	Dataset	Result
Information Retrieval	IR	MLDR67.05	9
Token Classification	TC Benchmarks	CONLL93.6	9
Question Answering	QA benchmarks	ReCoRD Score58.22	9
Sentence Classification	SC	MNLI Accuracy85.66	9
Needle-in-a-Haystack	NIAH 1	Success Rate (1k Context)79.69	5
Needle-in-a-Haystack	NIAH-2 (test)	NIAH-2 Success Rate (1k)78.94	5

Showing 6 of 6 rows

Other info

GitHub

Follow for update

@wizwand_team Discord