Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

About

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on individual datasets, such models often struggle with generalization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing challenge in AI is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, we introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances (input-output pairs). The instructions are obtained from crowdsourcing instructions used to create existing NLP datasets and mapped to a unified schema. Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones. We adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Our results indicate that models benefit from instructions when evaluated in terms of generalization to unseen tasks (19% better for models utilizing instructions). These models, however, are far behind an estimated performance upperbound indicating significant room for more progress in this direction.

Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi• 2021

Related benchmarks

TaskDatasetResultRank
Text ClassificationAG News (test)
Accuracy83.43
228
Text ClassificationTREC
Accuracy66.6
207
Text ClassificationSST-2 (test)
Accuracy95.77
185
Text ClassificationMR (test)
Accuracy90.85
148
Subjectivity ClassificationSubj (test)
Accuracy68.1
127
Text ClassificationTREC (test)
Accuracy66.6
115
Text ClassificationMR
Accuracy90.85
106
Text ClassificationSST-5 (test)
Accuracy51.9
60
Text ClassificationSST-5
Accuracy51.9
52
Text ClassificationSubj
CA (%)68.1
48
Showing 10 of 13 rows

Other info

Follow for update