Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SLOT: Sample-specific Language Model Optimization at Test-time

About

We propose SLOT (Sample-specific Language Model Optimization at Test-time), a novel and parameter-efficient test-time inference approach that enhances a language model's ability to more accurately respond to individual prompts. Existing Large Language Models (LLMs) often struggle with complex instructions, leading to poor performances on those not well represented among general samples. To address this, SLOT conducts few optimization steps at test-time to update a light-weight sample-specific parameter vector. It is added to the final hidden layer before the output head, and enables efficient adaptation by caching the last layer features during per-sample optimization. By minimizing the cross-entropy loss on the input prompt only, SLOT helps the model better aligned with and follow each given instruction. In experiments, we demonstrate that our method outperforms the compared models across multiple benchmarks and LLMs. For example, Qwen2.5-7B with SLOT achieves an accuracy gain of 8.6% on GSM8K from 57.54% to 66.19%, while DeepSeek-R1-Distill-Llama-70B with SLOT achieves a SOTA accuracy of 68.69% on GPQA among 70B-level models. Our code is available at https://github.com/maple-research-lab/SLOT.

Yang Hu, Xingyu Zhang, Xueji Fang, Zhiyang Chen, Xiao Wang, Huatian Zhang, Guojun Qi• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringGPQA
Accuracy35.86
258
Mathematical ReasoningAIME 24
Accuracy20
113
Language ModelingAdaptEval (test)
NLL2.1682
32
Question AnsweringNQ-Open
ROUGE-Lsum0.2722
14
Language Model EvaluationAdaptEval
ROUGE-Lsum0.2325
14
Question AnsweringSQuAD
ROUGE-Lsum76.57
14
SummarizationXsum
ROUGE-Lsum18
14
Showing 7 of 7 rows

Other info

Follow for update