Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling
About
Large token-indexed lookup tables provide a compute-decoupled scaling path, but their practical gains are often limited by poor parameter efficiency and rapid memory growth. We attribute these limitations to Zipfian under-training of the long tail, heterogeneous demand across layers, and "slot collapse" that produces redundant embeddings. To address this, we propose X-GRAM, a frequency-aware dynamic token-injection framework. X-GRAM employs hybrid hashing and alias mixing to compress the tail while preserving head capacity, and refines retrieved vectors via normalized SwiGLU ShortConv to extract diverse local n-gram features. These signals are integrated into attention value streams and inter-layer residuals using depth-aware gating, effectively aligning static memory with dynamic context. This design introduces a memory-centric scaling axis that decouples model capacity from FLOPs. Extensive evaluations at the 0.73B and 1.15B scales show that X-GRAM improves average accuracy by as much as 4.4 points over the vanilla backbone and 3.2 points over strong retrieval baselines, while using substantially smaller tables in the 50% configuration. Overall, by decoupling capacity from compute through efficient memory management, X-GRAM offers a scalable and practical paradigm for future memory-augmented architectures. Code aviliable in https://github.com/Longyichen/X-gram.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | Accuracy54.7 | 1442 | |
| Physical Commonsense Reasoning | PIQA | Accuracy70.8 | 696 | |
| Multitask Language Understanding | MMLU | Accuracy27.3 | 263 | |
| Reading Comprehension | BoolQ | Accuracy (BoolQ)60.8 | 228 | |
| Social Commonsense Reasoning | SocialIQA | Accuracy43.8 | 143 | |
| Science Question Answering | ARC Challenge | Accuracy32.2 | 108 | |
| Science Question Answering | SciQ | Accuracy (SciQ)86 | 101 | |
| Science Question Answering | OpenBookQA | Accuracy33.4 | 82 | |
| Science Question Answering | ARC Easy | Accuracy61.1 | 75 |