TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
About
Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH500 (test) | Accuracy27 | 895 | |
| Question Answering | ARC Challenge | Accuracy (ARC)64.4 | 598 | |
| Commonsense Reasoning | WinoGrande | Accuracy56.4 | 453 | |
| Question Answering | ARC Easy | Accuracy77.3 | 210 | |
| Multi-task Language Understanding | MMLU (test) | -- | 87 | |
| Boolean Question Answering | BoolQ | Accuracy85.9 | 57 | |
| Mathematical Reasoning | GSM8K Hard | Accuracy61.8 | 52 | |
| Boolean Question Answering | BoolQ (test) | Accuracy (Avg)86.2 | 41 | |
| Question Answering | ARC Easy | Accuracy (ARC-E)92.18 | 15 | |
| Commonsense Question Answering | CommonQA | Accuracy84.4 | 12 |