Avey-B
About
Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Information Retrieval | IR | MLDR67.05 | 9 | |
| Token Classification | TC Benchmarks | CONLL93.6 | 9 | |
| Question Answering | QA benchmarks | ReCoRD Score58.22 | 9 | |
| Sentence Classification | SC | MNLI Accuracy85.66 | 9 | |
| Needle-in-a-Haystack | NIAH 1 | Success Rate (1k Context)79.69 | 5 | |
| Needle-in-a-Haystack | NIAH-2 (test) | NIAH-2 Success Rate (1k)78.94 | 5 |