Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM

About

This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks using limited task-specific examples. We position AIR within the broader landscape of adaptation strategies, including prompt optimization, retrieval-based methods, and fine-tuning. We then compare these approaches across a diverse benchmark suite designed to stress different task requirements, such as knowledge injection, structured extraction, label remapping, and logical reasoning. The paper argues that adaptation performance is strongly task-dependent: no single method dominates across all settings. Across five benchmarks, AIR was strongest or near-best on label-remapping classification, while KNN retrieval performed best on closed-book QA, and fine-tuning dominated structured extraction and event-order reasoning. AIR is most promising when task behavior can be captured by compact, interpretable instruction rules, while retrieval and fine-tuning remain stronger in tasks dominated by source-specific knowledge or dataset-specific annotation regularities.

Solomiia Bilyk, Volodymyr Getmanskyi, Taras Firman• 2026

Related benchmarks

TaskDatasetResultRank
Classification (8 classes)Twitter customer-support requests
Accuracy (8 classes)95.31
8
Event Logical ReasoningBizFinBench v2
Accuracy51.67
8
PII detectionPUPA
F1 Score59.32
8
Closed-book Question AnsweringEver Young
LLM Score42.08
8
Information ExtractionCampaign-finance filings
Mean per field Accuracy35.9
8
Showing 5 of 5 rows

Other info

Follow for update