Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

About

Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.

Omar Naim, Krish Sharma, Niyar R Barman, Nicholas Asher• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)
Accuracy27
895
Question AnsweringARC Challenge
Accuracy (ARC)64.4
598
Commonsense ReasoningWinoGrande
Accuracy56.4
453
Question AnsweringARC Easy
Accuracy77.3
210
Multi-task Language UnderstandingMMLU (test)--
87
Boolean Question AnsweringBoolQ
Accuracy85.9
57
Mathematical ReasoningGSM8K Hard
Accuracy61.8
52
Boolean Question AnsweringBoolQ (test)
Accuracy (Avg)86.2
41
Question AnsweringARC Easy
Accuracy (ARC-E)92.18
15
Commonsense Question AnsweringCommonQA
Accuracy84.4
12
Showing 10 of 28 rows

Other info

Follow for update