DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning

About

Large Language Models (LLMs) exhibit substantial parameter redundancy, particularly in Feed-Forward Networks (FFNs). Existing pruning methods suffer from two primary limitations. First, reliance on dataset-specific calibration introduces significant data dependency and computational overhead. Second, being predominantly static, they fail to account for the evolving subset of knowledge neurons in LLMs during autoregressive generation as the context evolves. To address this, we introduce DART, i.e., Dynamic Attention-Guided Runtime Tracing), a lightweight, training-free method that performs on-the-fly context-based pruning. DART monitors shifts in attention score distributions to infer context changes, dynamically updating neuron-level masks to retain salient parameters. Across ten benchmarks, DART outperforms prior dynamic baseline, achieving accuracy gains of up to 14.5% on LLAMA-3.1-8B at 70% FFN sparsity. Furthermore, DART achieves up to 3x better ROUGE-L scores with respect to static-masked pruning on summarization tasks, with its performance comparable to the original dense models. We conclusively demonstrate that the proposed framework effectively adapts to diverse semantic contexts, preserves model capabilities across both general and domain-specific tasks while running at less than 10MBs of memory for LLAMA-3.1-8B(16GBs) with 0.1% FLOPs overhead. The code is available at https://github.com/seeder-research/DART.

Abhishek Tyagi, Yunuo Cen, Shrey Dhorajiya, Bharadwaj Veeravalli, Xuanyao Fong• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy64.58	1896
Commonsense Reasoning	WinoGrande	Accuracy65.98	1442
Language Understanding	MMLU	Accuracy34.14	844
Question Answering	ARC-E	Accuracy59.43	523
Medical Question Answering	MedMCQA	Accuracy29.6	521
Question Answering	OBQA	Accuracy36.8	347
Question Answering	BoolQ	Accuracy66.2	317
Multitask Language Understanding	MMLU (test)	Accuracy52.33	312
Question Answering	ARC-C	Accuracy38.99	258
Question Answering	GPQA	Accuracy27.21	258

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord