Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GLASS: Global-Local Aggregation for Inference-time Sparsification of LLMs

About

Inference-time sparsification is a promising path to deploy large language models (LLMs) on resource-constrained devices, yet existing training-free methods typically estimate feedforward network (FFN) neuron importance from the input prompt alone. We show this prompt-only signal is often unreliable, especially for short prompts and long-form decoding, leading to inaccurate masks and degraded generation fidelity. We propose GLASS, a plug-and-play, training-free framework that stabilizes dynamic FFN pruning by aggregating two complementary views of neuron criticality: local prompt-specific activations and a global model-intrinsic prior. GLASS fuses global and local signals via rank aggregation, yielding robust critical-neuron selection even when the prompt is short. We interpret GLASS as the maximum-a-posteriori consensus ranking under a permutation-based probabilistic model, providing a principled foundation for its weighted rank-aggregation rule. We apply GLASS to a diverse set of open-source LLMs, and show that it yields substantial improvements over prior training-free baselines in the challenging short-prompt, long-generation scenarios, achieving up to 45.10% lower perplexity and 25.73% lower KL divergence, while delivering significant on-device decoding speedup.

Amirmohsen Sattarifard, Sepehr Lavasani, Kunlin Zhang, Amirhossein Rajabpour, Hanlin Xu, Fengyu Sun, Negar Hassanpour, Chao Gao• 2025

Related benchmarks

TaskDatasetResultRank
Text ClassificationBoolQ
Accuracy81.56
118
ClassificationARCENE
Accuracy81.69
60
Long-form generationAlpaca
Perplexity (PPL)2.4268
30
Correctness PredictionPIQA
Accuracy80.47
28
Multi-class classificationCOPA
Accuracy92
22
Text ClassificationHellaSwag
Accuracy61.01
14
ClassificationARC-C
Accuracy50.77
10
Short-GenerationXsum
ROUGE-127.27
10
Short-GenerationCNN/DailyMail
ROUGE-121.15
10
Short-GenerationCoQA
F1 Score (CoQA)80.53
10
Showing 10 of 11 rows

Other info

Follow for update