Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

About

Parameter-efficient fine-tuning (PEFT) has become a practical route for adapting large language models to downstream tasks, with LoRA-style methods being particularly attractive because they are inexpensive to train and easy to deploy. Most LoRA variants, however, revise the update rule within the weight space of each layer and leave the intermediate representations formed by deeper layers largely unused. We propose Echo-LoRA, a cross-layer representation injection method for parameter-efficient fine-tuning. During training, Echo-LoRA collects boundary hidden states from deeper source layers, aggregates them into a sample-level echo representation, and uses lightweight projection and gating networks to inject the resulting signal into shallow LoRA or DoRA modules. Answer-only masking, masked distillation, and stochastic routing are used to keep this auxiliary path stable and to reduce the gap between training and inference. On eight commonsense reasoning benchmarks, Echo-LoRA exceeds the reported LoRA baselines by 5.7 percentage points on average across LLaMA-7B, LLaMA2-7B, and LLaMA3-8B. Under reproduced LoRA baselines in our unified implementation, the average gain is 3.0 points; when combined with DoRA, the gain is 2.7 points. The Echo path is discarded after training, so the deployed model keeps the original low-rank LoRA/DoRA form and adds neither inference-time parameters nor inference computation.

Yihang Peng, Peng Jin, Jie Gong, Xingyuan Chen, Lingjiao Xu, Ning Su, Yan Ran• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	Commonsense Reasoning	BoolQ Accuracy75.6	54
Commonsense Reasoning	Commonsense Reasoning	BoolQ Accuracy75.1	27
Code Synthesis	HumanEval	pass@125.78	11
Mathematical Reasoning	GSM8K	Accuracy58.61	7
Broad knowledge integration	MMLU	Accuracy53.9	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord