LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates

About

Recent findings reveal that much of the knowledge in a Transformer-based Large Language Model (LLM) is encoded in its feed-forward (FFN) layers, where each FNN layer can be interpreted as the summation of sub-updates, each corresponding to a weighted column vector from the FFN's value parameter matrix that often encodes human-interpretable concepts. In light of this, we hypothesize that model performance and behaviors can be further enhanced and controlled by modulating the contributions of these sub-updates based on their relevance to the input or target output style, and propose LLMBRACES, a novel and efficient method that computes relevance scores associated with value vectors in FFN layers and leverages these scores to dynamically adjust the contribution of sub-updates. By optimizing sub-update contributions, LLMBRACES refines the prediction process, leading to more accurate and reliable outputs, much like a 'brace' providing support and stability. Moreover, LLMBRACES can be extended to support conditional control over generation characteristics, such as sentiment, thereby offering fine-grained steering of LLM outputs. Extensive experiments on various LLMs-including Qwen2.5-1.5B, Llama2-7B, and Llama3-8B-demonstrate that LLMBRACES outperforms baseline approaches in both fine-tuning and zero-shot settings while requiring significantly fewer tunable parameters, up to 75% fewer compared to LoRA. Furthermore, LLMBRACES excels in sentiment-controlled generation and toxicity reduction, highlighting its potential for flexible, controlled text generation across applications.

Ying Shen, Lifu Huang• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)	BoolQ Accuracy74.4	238
Question Answering	PopQA	Accuracy36.21	186
Question Answering	TruthfulQA	--	164
Question Answering	Natural Questions (NQ)	Accuracy20.3	48
Trivia QA	Trivia QA	Accuracy66.11	32
Sentiment Steering	OpenWebText Neutral to Positive (test)	Perplexity (PPL)30.03	27
Sentiment Steering	OpenWebText Neutral to Negative (test)	Perplexity (PPL)39.78	27
Question Answering	AGIEval	Accuracy32.11	12
Toxic Language Suppression	RealToxicityPrompts 10K nontoxic prompts GPT2-large generation (test)	Max Toxicity0.172	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord