Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

About

Large Language Models (LLMs) excel at general language tasks but struggle in specialized domains. Specialized Generalist Models (SGMs) address this by preserving broad capabilities while adapting to target domains. However, existing architectures provide limited support for task-guided specialized memory mechanisms. In this work, we introduce Nirvana, an SGM featuring specialized memory, linear-time complexity, and test-time task information extraction. Central to Nirvana are: (1) Task-Aware Memory Trigger ($\textit{Trigger}$), which treats each input as a self-supervised fine-tuning task and adjusts task-related parameters on the fly; and (2) Specialized Memory Updater ($\textit{Updater}$), which dynamically consolidates task-relevant context. Nirvana matches or surpasses LLM baselines on general benchmarks and achieves the lowest perplexity across specialized domains including biomedicine, finance, and law. On the challenging task of Magnetic Resonance Imaging (MRI), we attach lightweight codecs to the frozen Nirvana backbone and fine-tune them on paired k-space signals and images. Nirvana achieves higher-fidelity reconstructions than conventional LLM-based models, with Trigger providing effective domain-specific adaptation. Ablation studies confirm that removing Trigger leads to substantial degradation across all tasks, underscoring its essential role in task-aware specialization. Models are available at https://huggingface.co/collections/YuhuaJiang/nirvana. Code is available at https://github.com/YuhuaJiang2002/Nirvana.

Yuhua Jiang, Shuang Cheng, Yihao Liu, Ermo Hua, Che Jiang, Weigao Sun, Yu Cheng, Feifei Gao, Biqing Qi, Bowen Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText
PPL16.05
732
Language ModelingLAMBADA
Accuracy50.37
268
Zero-shot ReasoningPIQA
PIQA Zero-shot Accuracy73.67
62
Zero-shot ReasoningWinoGrande
Accuracy59.48
54
Common Sense ReasoningHellaSwag 0-shot
Accuracy58.25
34
Common Sense ReasoningARC-Challenge 0-shot
Accuracy39.51
31
Recall-intensive retrievalRecall-intensive retrieval tasks SWDE, SQUADE, FDA, Trivial QA, NQ, Drop
Performance on SWDE37.8
24
Common Sense ReasoningARC-Easy (ARC-E) 0-shot
Accuracy69.92
24
Long-context ReasoningLongBench
NQA16.6
12
Zero-shot Common Sense ReasoningSIQA
Accuracy (Zero-shot)41.62
12
Showing 10 of 13 rows

Other info

Follow for update