TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

About

Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.

Omar Naim, Krish Sharma, Niyar R Barman, Nicholas Asher• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy27	922
Question Answering	ARC Challenge	Accuracy (ARC)64.4	631
Commonsense Reasoning	WinoGrande	Accuracy56.4	453
Question Answering	ARC Easy	Accuracy77.3	246
Multi-task Language Understanding	MMLU (test)	--	107
Mathematical Reasoning	GSM8K Hard	Accuracy61.8	82
Boolean Question Answering	BoolQ	Accuracy85.9	57
Boolean Question Answering	BoolQ (test)	Accuracy (Avg)86.2	41
Question Answering	ARC Easy	Accuracy (ARC-E)92.18	15
Commonsense Question Answering	CommonQA	Accuracy84.4	12

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord